etseidl commented on PR #197: URL: https://github.com/apache/parquet-format/pull/197#issuecomment-1700313640
> As the implemention detail, can we ignore the `rep-def` histogram when `max-rep <= 1, max-def <= 1`? Since we already have page-ordinal in OffsetIndex and null-count in ColumnIndex? This might take less space but make it a bit tricky. @etseidl @emkornfield I think that would be ok. My current implementation only writes the histograms when `max_level > 0`, but could easily be changed to ` > 1`. On the read side, the logic is a little harder, but not unmanageable, especially since we already have to deal with the `max_level == 0` case. Once we settle on where everything goes, I'll modify my code to make use of the new structures and see if there are any problems. @emkornfield does this work for you? > The second is that, I think should size better in `OffsetIndex` rather than `ColumnIndex`. I'm fine with this. Kind of in the weeds, but by splitting it up this way we do save a little bit of space and processing not having to encode the `SizeStatistics` wrapper. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
