[GitHub] [arrow] westonpace commented on issue #35300: Arrow IPC format: question on length of record batches

via GitHub Wed, 26 Apr 2023 14:00:47 -0700


westonpace commented on issue #35300:
URL: https://github.com/apache/arrow/issues/35300#issuecomment-1524039687


   Yes.  I think the most common case is the end of a file.  For example, if a 
file has 1 million rows and, for whatever reason, the data producer decided to 
batch in groups of 300k then the last batch would have only 100k items in it.
   
   However, I'm pretty sure producers are allowed to vary this however they 
feel like.  For example, if some kind of filtering is applied to a read then 
you might get a stream of different sized batches based on how many rows 
happened to meet the particular filter.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] westonpace commented on issue #35300: Arrow IPC format: question on length of record batches

Reply via email to