[GitHub] [parquet-format] m29498 commented on pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

via GitHub Fri, 25 Aug 2023 08:21:22 -0700


m29498 commented on PR #197:
URL: https://github.com/apache/parquet-format/pull/197#issuecomment-1693533716


   Thanks @GregoryKimball and @etseidl We would also find this change very 
useful. As @GregoryKimball mentioned, we can use the extra size statistics in 
the page footer to be able to more accurately predict the memory usage of 
decompressed pages in files.
   
   We have a usecase based on rapidsai/cudf that would greatly benefit from the 
the chunked Parquet reader working in the manner described above. Currently, we 
have to go to a lot of work in our GPU based Parquet reader to ensure that we 
don't try to read more of a Parquet file than we have room to decompress in GPU 
memory. With this change, that information would be available in the file and 
no prediction of sizes would be necessary.
   
   We would really like to see this implemented!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [parquet-format] m29498 commented on pull request #197: PARQUET-2261: add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

Reply via email to