alamb commented on issue #208: URL: https://github.com/apache/arrow-rs/issues/208#issuecomment-917615261
> And the issue is that the data buffer points to the original larger array. Then, that larger array is ultimately turned into the FlightData which is a waste. Yes, that is the crux of the issue we found. > Assuming that's all correct is there a preference as to where a fix should be applied? i.e. whether at flight_data_from_arrow_batch, encoded_batch, or record_batch_to_bytes? I am not sure to be honest as I am not familiar with the flight code. Perhaps @nevi-me or @jorgecarleitao who have more experience in how IPC / flight is supposed to work might have thoughts on how to handle serializing bytes for an Array whose backing `Buffer` is much larger. Another avenue we can explore is to review how the C++ implementation handles the case and/or ask about this on [email protected]. One way to reduce potential unintended side effects could be to make the optimization optional (an option on [`IpcWriteOptions`](https://docs.rs/arrow/5.3.0/arrow/ipc/writer/struct.IpcWriteOptions.html), perhaps) while we test it out more broadly, and then switch the default value in a later version. > Naively I was thinking at the record_batch_to_bytes level - but i think that might impact IPC in general. Yes. However, maybe that is ok (as that seems to be optimizing the serialization of Arrow Arrays). However, I am not sure what the expectations are here. > Separately, ive been looking if there are any methods / helpers for recreating a RecordBatch out of the data / offsets / len of another RecordBatch. `RecordBatch::slice` is what I know of for this purpose: https://docs.rs/arrow/5.3.0/arrow/record_batch/struct.RecordBatch.html#method.slice -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
