tustvold commented on issue #5150: URL: https://github.com/apache/arrow-rs/issues/5150#issuecomment-1842799426
Aah I see what the issue here is, thank you for the reproducer. The problem is read_records will currently return incomplete reads if there isn't sufficient buffer space to accommodate the requested number of records. This is fine for the arrow APIs as RecordReader ensures that it then grows the buffers and reads out the remaining data. Unfortunately RecordReader is currently crate private, extremely specific to how the arrow decoding process works, and not really something I would want to expose. On the flip-side ColumnWriter needs to ensure it has complete records, as otherwise `write_mini_batch` might flush a page with a partial record, which as discussed above is in contravention of both the standard and the expectations of many readers. I will see if I can't make read_records behave the way it is documented to behave, and never return truncated records :sweat_smile: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
