alamb edited a comment on issue #208:
URL: https://github.com/apache/arrow-rs/issues/208#issuecomment-917615261


   > And the issue is that the data buffer points to the original larger array. 
Then, that larger array is ultimately turned into the FlightData which is a 
waste.
   
   Yes, that is the crux of the issue we found. 
   
   > Assuming that's all correct is there a preference as to where a fix should 
be applied? i.e. whether at flight_data_from_arrow_batch, encoded_batch, or 
record_batch_to_bytes?
   
   I am not sure to be honest as I am not familiar with the flight code. 
Perhaps @nevi-me  or @jorgecarleitao who have more experience in how IPC / 
flight is supposed to work might have thoughts on how to handle serializing 
bytes for an Array whose backing `Buffer` is much larger. Another avenue we can 
explore is to review how the C++ implementation handles the case and/or ask 
about this on [email protected].
   
    One way to reduce potential unintended side effects could be to make the 
optimization optional (an option on 
[`IpcWriteOptions`](https://docs.rs/arrow/5.3.0/arrow/ipc/writer/struct.IpcWriteOptions.html),
 perhaps) while we test it out more broadly, and then switch the default value 
in a later version. 
   
   > Naively I was thinking at the record_batch_to_bytes level - but i think 
that might impact IPC in general.
   
   Yes. However, maybe that is ok (as that seems to be optimizing the 
serialization of Arrow Arrays). However, I am not sure what the expectations 
are here. 
   
   > Separately, ive been looking if there are any methods / helpers for 
recreating a RecordBatch out of the data / offsets / len of another RecordBatch.
   
   `RecordBatch::slice` is what I know of for this purpose: 
https://docs.rs/arrow/5.3.0/arrow/record_batch/struct.RecordBatch.html#method.slice.
 (Kudos to @b41sh for adding that one)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to