[ 
https://issues.apache.org/jira/browse/PARQUET-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759079#comment-17759079
 ] 

ASF GitHub Bot commented on PARQUET-2261:
-----------------------------------------

m29498 commented on PR #197:
URL: https://github.com/apache/parquet-format/pull/197#issuecomment-1693533716

   Thanks @GregoryKimball and @etseidl We would also find this change very 
useful. As @GregoryKimball mentioned, we can use the extra size statistics in 
the page footer to be able to more accurately predict the memory usage of 
decompressed pages in files.
   
   We have a usecase based on rapidsai/cudf that would greatly benefit from the 
the chunked Parquet reader working in the manner described above. Currently, we 
have to go to a lot of work in our GPU based Parquet reader to ensure that we 
don't try to read more of a Parquet file than we have room to decompress in GPU 
memory. With this change, that information would be available in the file and 
no prediction of sizes would be necessary.
   
   We would really like to see this implemented!




> [Format] Add statistics that reflect decoded size to metadata
> -------------------------------------------------------------
>
>                 Key: PARQUET-2261
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2261
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-format
>            Reporter: Micah Kornfield
>            Assignee: Micah Kornfield
>            Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to