[GitHub] [arrow-datafusion] jorgecarleitao commented on issue #1248: Optimized `RecordBatch` for constant columns

GitBox Sun, 07 Nov 2021 00:03:46 -0700


jorgecarleitao commented on issue #1248:
URL: 
https://github.com/apache/arrow-datafusion/issues/1248#issuecomment-962562023



   Tangentially related, since I have been investigating something related: the 
`RecordBatch` in `arrow-rs` is not defined in the arrow specification. 
Specifically, the term "RecordBatch" comes from the IPC specification 
[RecordBatch 
message](https://arrow.apache.org/docs/format/Columnar.html#recordbatch-message),
 but such a message does hold a schema, since the schema is written on a 
separate [Schema 
message](https://arrow.apache.org/docs/format/Columnar.html#schema-message) at 
the beginning of the file. The C data interface has no such concept at all. So, 
I do not think it is an issue to use something else that fits our needs better.
   
   I think that the two requirements for something to be consistent with the 
arrow format are:
   * data and schema round-trips lossless over Arrow's IPC boundary
   * data and schema round-trips lossless over Arrow's C data interface 
boundary at zero cost
   
   I do not think RLE and other encodings fulfill these requirements (not sure 
whether this is an issue, though).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] jorgecarleitao commented on issue #1248: Optimized `RecordBatch` for constant columns

Reply via email to