GavinRay97 opened a new issue #12618: URL: https://github.com/apache/arrow/issues/12618
Based on feedback from mailing list thread here: - https://lists.apache.org/thread/852btc8tg5gyxglzkrmddts237fpwk8y The idea being a higher-level API wrapping `VectorSchemaRoot` and `FieldVector` that use Java objects and a row-oriented style for familiarity. Along with some utilities for manipulating `DataFrame`'s (IE, combine rows from multiple frames with the same schema, convert to a FlightSQL "GetTables" `Schema` object, etc). I believe this would be tremendously valuable. Below is an example of a quickly-thrown-together rough idea, just to get the conversation started: - Full code available at this gist: https://gist.github.com/GavinRay97/c0434574b4516f55da1eebfd4c1519b6 - This code is probably pretty poor and likely doesn't follow Arrow best-practices ## Example Usage ```java class DataFrameTest { public static void main(String[] args) { DataFrame df = DataFrame.create(); df.addColumn("name", MinorType.VARCHAR, false); df.addColumn("age", MinorType.INT, false); df.addColumn("weight", MinorType.FLOAT4, false); df.addRow(Map.of("name", "Alice", "age", 21, "weight", 50.0)); df.addRow(Map.of("name", "Bob", "age", 30, "weight", 60.0)); System.out.println("======= User DataFrame -> VectorSchemaRoot (TSV) ======="); VectorSchemaRoot root = df.toArrowVectorSchemaRoot(); System.out.println(root.contentToTSVString()); assert (root.getRowCount() == 2) : "Expected 2 rows"; assert (root.getSchema().getFields().size() == 3) : "Expected 3 columns"; DataFrame roundtrip = DataFrame.fromArrowVectorSchemaRoot(root); assert (df.equals(roundtrip)) : "DataFrame equality failed"; System.out.println("======= Roundtrip (DF -> VectorSchemaRoot -> DF) ======="); System.out.println(roundtrip + "\n"); System.out.println("======= FlightSQL GetTables Schema ======="); VectorSchemaRoot flightSchema = new FlightSQLGetTablesSchemaPOJO( "catalog1", "schema1", "users", "TABLE", df) .toArrowVectorSchemaRoot(); System.out.println(flightSchema.contentToTSVString()); System.out.println("======= Merge DataFrames ======="); DataFrame df3 = DataFrame.mergeDataFrames(true, df, roundtrip); System.out.println(df3.toArrowVectorSchemaRoot().contentToTSVString()); assert (df3.rows().size() == df.rows().size() + roundtrip.rows().size()) : "Merge DataFrame failed"; } } ``` ## Output ```java ======= User DataFrame -> VectorSchemaRoot (TSV) ======= name age weight Alice 21 50.0 Bob 30 60.0 ======= Roundtrip (DF -> VectorSchemaRoot -> DF) ======= DataFrame[ columns=[name: Utf8 not null, age: Int(32, true) not null, weight: FloatingPoint(SINGLE) not null], rows=[{name=Alice, weight=50.0, age=21}, {name=Bob, weight=60.0, age=30}] ] ======= FlightSQL GetTables Schema ======= catalog_name table_schema db_schema_name table_name table_type catalog1 [B@4bdeaabb schema1 users TABLE ======= Merge DataFrames ======= name age weight Alice 21 50.0 Bob 30 60.0 Alice 21 50.0 Bob 30 60.0 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
