judahrand commented on PR #37469:
URL: https://github.com/apache/arrow/pull/37469#issuecomment-1714355320

   > When disable collect statistics, currently it would be hard to collect the 
`ColumnIndex`, because `ColumnIndex` relies on `Statistics`.
   
   Yeah, that makes sense! Would be good to clarify in the docs - will see if I 
get to it in another PR. 
   
   > Also, by the way, by default, if column index is enabled, the page header 
statistics will not be written. (Since spec says if column index exists, page 
header is not tent to be written)
   
   Yeah, what gets written and what doesn't is quite confusing. In fact the 
spec doesn't techincally say that the page-level statistics when writing the 
ColumnIndex but that it isn't recommended (one might want both in order to 
support old readers).
   
https://github.com/apache/parquet-format/blob/master/PageIndex.md#technical-approach
   
   It's probably sensible default behaviour but it'd be nice to force being 
able to write both. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to