[ https://issues.apache.org/jira/browse/PARQUET-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17760692#comment-17760692 ]
ASF GitHub Bot commented on PARQUET-2261: ----------------------------------------- mapleFU commented on PR #197: URL: https://github.com/apache/parquet-format/pull/197#issuecomment-1700303344 After your data in https://github.com/apache/parquet-format/pull/197#issuecomment-1699773196 , now I'm positive on having a size in `OffsetIndex`. As the implemention detail, can we ignore the `rep-def` histogram when `max-rep <= 1, max-def <= 1`? Since we already have page-ordinal in OffsetIndex and null-count in ColumnIndex? This might take less space but make it a bit tricky. @etseidl @emkornfield The second is that, I think should size better in `OffsetIndex` rather than `ColumnIndex`. > [Format] Add statistics that reflect decoded size to metadata > ------------------------------------------------------------- > > Key: PARQUET-2261 > URL: https://issues.apache.org/jira/browse/PARQUET-2261 > Project: Parquet > Issue Type: Improvement > Components: parquet-format > Reporter: Micah Kornfield > Assignee: Micah Kornfield > Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)