Re: [I] Reintroduce read_batch in `GenericColumnReader` [arrow-rs]

via GitHub Wed, 06 Dec 2023 04:48:10 -0800


tustvold commented on issue #5150:
URL: https://github.com/apache/arrow-rs/issues/5150#issuecomment-1842799426


   Aah I see what the issue here is, thank you for the reproducer.
   
   The problem is read_records will currently return incomplete reads if there 
isn't sufficient buffer space to accommodate the requested number of records. 
This is fine for the arrow APIs as RecordReader ensures that it then grows the 
buffers and reads out the remaining data. Unfortunately RecordReader is 
currently crate private, extremely specific to how the arrow decoding process 
works, and not really something I would want to expose.
   
   On the flip-side ColumnWriter needs to ensure it has complete records, as 
otherwise `write_mini_batch` might flush a page with a partial record, which as 
discussed above is in contravention of both the standard and the expectations 
of many readers.
   
   I will see if I can't make read_records behave the way it is documented to 
behave, and never return truncated records :sweat_smile: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Reintroduce read_batch in `GenericColumnReader` [arrow-rs]

Reply via email to