GavinRay97 opened a new issue #12618:
URL: https://github.com/apache/arrow/issues/12618


   Based on feedback from mailing list thread here:
   - https://lists.apache.org/thread/852btc8tg5gyxglzkrmddts237fpwk8y
   
   The idea being a higher-level API wrapping `VectorSchemaRoot` and 
`FieldVector` that use Java objects and a row-oriented style for familiarity.
   
   Along with some utilities for manipulating `DataFrame`'s (IE, combine rows 
from multiple frames with the same schema, convert to a FlightSQL "GetTables" 
`Schema` object, etc).
   
   I believe this would be tremendously valuable.
   
   Below is an example of a quickly-thrown-together rough idea, just to get the 
conversation started:
   - Full code available at this gist: 
https://gist.github.com/GavinRay97/c0434574b4516f55da1eebfd4c1519b6
   - This code is probably pretty poor and likely doesn't follow Arrow 
best-practices
   
   ## Example Usage
   
   ```java
   class DataFrameTest {
       public static void main(String[] args) {
           DataFrame df = DataFrame.create();
   
           df.addColumn("name", MinorType.VARCHAR, false);
           df.addColumn("age", MinorType.INT, false);
           df.addColumn("weight", MinorType.FLOAT4, false);
   
           df.addRow(Map.of("name", "Alice", "age", 21, "weight", 50.0));
           df.addRow(Map.of("name", "Bob", "age", 30, "weight", 60.0));
   
           System.out.println("======= User DataFrame -> VectorSchemaRoot (TSV) 
=======");
           VectorSchemaRoot root = df.toArrowVectorSchemaRoot();
           System.out.println(root.contentToTSVString());
           assert (root.getRowCount() == 2) : "Expected 2 rows";
           assert (root.getSchema().getFields().size() == 3) : "Expected 3 
columns";
   
           DataFrame roundtrip = DataFrame.fromArrowVectorSchemaRoot(root);
           assert (df.equals(roundtrip)) : "DataFrame equality failed";
   
           System.out.println("======= Roundtrip (DF -> VectorSchemaRoot -> DF) 
=======");
           System.out.println(roundtrip + "\n");
   
           System.out.println("======= FlightSQL GetTables Schema =======");
           VectorSchemaRoot flightSchema = new FlightSQLGetTablesSchemaPOJO(
                   "catalog1", "schema1", "users", "TABLE", df)
                   .toArrowVectorSchemaRoot();
           System.out.println(flightSchema.contentToTSVString());
   
           System.out.println("======= Merge DataFrames =======");
           DataFrame df3 = DataFrame.mergeDataFrames(true, df, roundtrip);
           
System.out.println(df3.toArrowVectorSchemaRoot().contentToTSVString());
           assert (df3.rows().size() == df.rows().size() + 
roundtrip.rows().size()) : "Merge DataFrame failed";
       }
   }
   ```
   
   ## Output
   
   ```java
   ======= User DataFrame -> VectorSchemaRoot (TSV) =======
   name age     weight
   Alice        21      50.0
   Bob  30      60.0
   
   ======= Roundtrip (DF -> VectorSchemaRoot -> DF) =======
   DataFrame[
     columns=[name: Utf8 not null, age: Int(32, true) not null, weight: 
FloatingPoint(SINGLE) not null],
     rows=[{name=Alice, weight=50.0, age=21}, {name=Bob, weight=60.0, age=30}]
   ]
   
   ======= FlightSQL GetTables Schema =======
   catalog_name table_schema    db_schema_name  table_name      table_type
   catalog1     [B@4bdeaabb     schema1 users   TABLE
   
   ======= Merge DataFrames =======
   name age     weight
   Alice        21      50.0
   Bob  30      60.0
   Alice        21      50.0
   Bob  30      60.0
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to