sdf-jkl commented on PR #9118: URL: https://github.com/apache/arrow-rs/pull/9118#issuecomment-3801836924
Thanks, I understand, this issue got me thinking about it in my sleep. I agree that the tests with two int cols do not cover different distributions of pages. Originally, they were meant to cover the case where the mask was not page-aware at all. The new #9243 test not covers that scenario and also checks different page distributions, which seemingly makes the old tests redundant. The #9243 test covers different page distributions despite also using a page limit because one col is utf8. When building the row groups, the arrow writer is smart and will use dictionary encoding on that column. This adds a dictionary page at the beginning of the col chunk and creates an offset between pages. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
