westonpace commented on issue #10885:
URL: https://github.com/apache/arrow/issues/10885#issuecomment-894450914


   The Java type `org.apache.arrow.vector.types.pojo.Schema` should be the same 
concept (not necessarily the same memory) as the C++ type `arrow::Schema` or in 
Python `pyarrow.Schema`.
   
   In the Arrow column format (flatbuffers reference) this is defined here: 
https://github.com/apache/arrow/blob/master/format/Schema.fbs
   
   Parquet also has a schema concept which is generally compatible with Arrow's 
schema.
   
   For sharing the schema between languages or processes there are a few 
serialization choices:
   
   * You can save an empty table in the IPC format (sometimes called feather)
     * This is probably the best choice for file storage of a schema
   * You can save an empty table in the parquet format
   * You can use the [C data 
interface](https://arrow.apache.org/docs/format/CDataInterface.html#the-arrowschema-structure)
 which defines a common memory representation of a schema (among other things)
     * This is probably the best choice when you don't want to use a file
   
   Java currently supports the IPC format so you should be able to read and 
write IPC files with empty tables.  That should allow you to save, restore, and 
transfer a schema.  You can also save the schema in Java and load it in C++.  
You could either use a temporary file or a shared buffer.
   
   There is work in progress to add the C data interface to Java.  This will 
allow you to copy schemas back and forth between Java and C++ without going to 
an intermediate file / byte array.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to