sdf-jkl commented on PR #9118:
URL: https://github.com/apache/arrow-rs/pull/9118#issuecomment-3801836924

   Thanks, I understand, this issue got me thinking about it in my sleep. 
   
   I agree that the tests with two int cols do not cover different 
distributions of pages. Originally, they were meant to cover the case where the 
mask was not page-aware at all. The new #9243 test not covers that scenario and 
also checks different page distributions, which seemingly makes the old tests 
redundant.
   
   The #9243 test covers different page distributions despite also using a page 
limit because one col is utf8. When building the row groups, the arrow writer 
is smart and will use dictionary encoding on that column. This adds a 
dictionary page at the beginning of the col chunk and creates an offset between 
pages.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to