sachouche commented on issue #1420: Drill 6664: Limit the maximum parquet reader batch rows to 64k URL: https://github.com/apache/drill/pull/1420#issuecomment-410401640 *What Parquet Used to do* - The parquet reader used to hardcode this information along with a comment: DEFAULT_RECORDS_TO_READ_IF_FIXED_WIDTH = 64*1024 - 1; // 64K - 1, max SV2 can address - Unfortunately, the SelectionVector2 only specifies "64k" as part of a comment (no mention of 64k-1) *Memory Optimization* - The memory optimization for a VL column can save 64k memory + avoids a reset and copy of the offset vector - Fixed length columns would wait few bytes of space since the last entry is unoccupied - Is this optimization super important? I would say no, as the current default batch memory is set to 16MB; in practice record batches will have less than 64k rows when VL columns are involved. *Conclusion* - I like @paul-rogers suggestion to use the ValueVector.MAX_ROW_COUNT as it satisfies the goal of this JIRA (64k) and brings us one step further to standardization. I'll update the changes shortly. Thanks for the feedback!
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
