westonpace commented on issue #10885: URL: https://github.com/apache/arrow/issues/10885#issuecomment-894450914
The Java type `org.apache.arrow.vector.types.pojo.Schema` should be the same concept (not necessarily the same memory) as the C++ type `arrow::Schema` or in Python `pyarrow.Schema`. In the Arrow column format (flatbuffers reference) this is defined here: https://github.com/apache/arrow/blob/master/format/Schema.fbs Parquet also has a schema concept which is generally compatible with Arrow's schema. For sharing the schema between languages or processes there are a few serialization choices: * You can save an empty table in the IPC format (sometimes called feather) * This is probably the best choice for file storage of a schema * You can save an empty table in the parquet format * You can use the [C data interface](https://arrow.apache.org/docs/format/CDataInterface.html#the-arrowschema-structure) which defines a common memory representation of a schema (among other things) * This is probably the best choice when you don't want to use a file Java currently supports the IPC format so you should be able to read and write IPC files with empty tables. That should allow you to save, restore, and transfer a schema. You can also save the schema in Java and load it in C++. You could either use a temporary file or a shared buffer. There is work in progress to add the C data interface to Java. This will allow you to copy schemas back and forth between Java and C++ without going to an intermediate file / byte array. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
