etseidl commented on PR #197:
URL: https://github.com/apache/parquet-format/pull/197#issuecomment-1700313640

   > As the implemention detail, can we ignore the `rep-def` histogram when 
`max-rep <= 1, max-def <= 1`? Since we already have page-ordinal in OffsetIndex 
and null-count in ColumnIndex? This might take less space but make it a bit 
tricky. @etseidl @emkornfield
   
   I think that would be ok.  My current implementation only writes the 
histograms when `max_level > 0`, but could easily be changed to ` > 1`.  On the 
read side, the logic is a little harder, but not unmanageable, especially since 
we already have to deal with the `max_level == 0` case. Once we settle on where 
everything goes, I'll modify my code to make use of the new structures and see 
if there are any problems. @emkornfield does this work for you?
   
   > The second is that, I think should size better in `OffsetIndex` rather 
than `ColumnIndex`.
   
   I'm fine with this. Kind of in the weeds, but by splitting it up this way we 
do save a little bit of space and processing not having to encode the 
`SizeStatistics` wrapper.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to