emkornfield commented on PR #197: URL: https://github.com/apache/parquet-format/pull/197#issuecomment-1492748667
> Do we want to include these statistics at both row group (column chunk) and page level? For the latter I am not sure it is the right approach. We implemented column indexes so one would not need to read the page header to get the related statistics. We even stopped writing `Statistics` into page headers in parquet-mr. If we only want these for the column chunk level then I would suggest having it under `ColumnMetaData` directly. @gszadovsky Is there an argument against flexibility here? I believe parquet-cpp still writes page headers. One argument for page headers is it allows readers better incremental estimates of memory needed as they progress (although it is possible taking an average size per cell at column chunk is sufficient here) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
