jorgecarleitao commented on issue #1248: URL: https://github.com/apache/arrow-datafusion/issues/1248#issuecomment-962562023
Tangentially related, since I have been investigating something related: the `RecordBatch` in `arrow-rs` is not defined in the arrow specification. Specifically, the term "RecordBatch" comes from the IPC specification [RecordBatch message](https://arrow.apache.org/docs/format/Columnar.html#recordbatch-message), but such a message does hold a schema, since the schema is written on a separate [Schema message](https://arrow.apache.org/docs/format/Columnar.html#schema-message) at the beginning of the file. The C data interface has no such concept at all. So, I do not think it is an issue to use something else that fits our needs better. I think that the two requirements for something to be consistent with the arrow format are: * data and schema round-trips lossless over Arrow's IPC boundary * data and schema round-trips lossless over Arrow's C data interface boundary at zero cost I do not think RLE and other encodings fulfill these requirements (not sure whether this is an issue, though). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
