[ https://issues.apache.org/jira/browse/PARQUET-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759079#comment-17759079 ]
ASF GitHub Bot commented on PARQUET-2261: ----------------------------------------- m29498 commented on PR #197: URL: https://github.com/apache/parquet-format/pull/197#issuecomment-1693533716 Thanks @GregoryKimball and @etseidl We would also find this change very useful. As @GregoryKimball mentioned, we can use the extra size statistics in the page footer to be able to more accurately predict the memory usage of decompressed pages in files. We have a usecase based on rapidsai/cudf that would greatly benefit from the the chunked Parquet reader working in the manner described above. Currently, we have to go to a lot of work in our GPU based Parquet reader to ensure that we don't try to read more of a Parquet file than we have room to decompress in GPU memory. With this change, that information would be available in the file and no prediction of sizes would be necessary. We would really like to see this implemented! > [Format] Add statistics that reflect decoded size to metadata > ------------------------------------------------------------- > > Key: PARQUET-2261 > URL: https://issues.apache.org/jira/browse/PARQUET-2261 > Project: Parquet > Issue Type: Improvement > Components: parquet-format > Reporter: Micah Kornfield > Assignee: Micah Kornfield > Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)