[ 
https://issues.apache.org/jira/browse/PARQUET-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17760692#comment-17760692
 ] 

ASF GitHub Bot commented on PARQUET-2261:
-----------------------------------------

mapleFU commented on PR #197:
URL: https://github.com/apache/parquet-format/pull/197#issuecomment-1700303344

   After your data in 
https://github.com/apache/parquet-format/pull/197#issuecomment-1699773196 , now 
I'm positive on having a size in `OffsetIndex`. 
   
   As the implemention detail, can we ignore the `rep-def` histogram when 
`max-rep <= 1, max-def <= 1`? Since we already have page-ordinal in OffsetIndex 
and null-count in ColumnIndex? This might take less space but make it a bit 
tricky. @etseidl @emkornfield 
   
   The second is that, I think should size better in `OffsetIndex` rather than 
`ColumnIndex`.




> [Format] Add statistics that reflect decoded size to metadata
> -------------------------------------------------------------
>
>                 Key: PARQUET-2261
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2261
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-format
>            Reporter: Micah Kornfield
>            Assignee: Micah Kornfield
>            Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to