etseidl commented on PR #197: URL: https://github.com/apache/parquet-format/pull/197#issuecomment-1707166521
> I think we can now move to the simpler option of just putting SizeStatistics on Column Index to consolidate everything? I would guess this would also make implementations simpler. I have both ways implemented on the writer side, so I do not hold a strong opinion in that regard. With regard to clients, if the unencoded byte sizes were to remain in the OffsetIndex, I can think of only one case where I would use the OffsetIndex alone and not also read the ColumnIndex, I do think using SizeStatistics on both the ColumnIndex and ColumnMetaData is more consistent. However I'm happy to yield to whoever has the strongest opinion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
