m29498 commented on PR #197: URL: https://github.com/apache/parquet-format/pull/197#issuecomment-1693533716
Thanks @GregoryKimball and @etseidl We would also find this change very useful. As @GregoryKimball mentioned, we can use the extra size statistics in the page footer to be able to more accurately predict the memory usage of decompressed pages in files. We have a usecase based on rapidsai/cudf that would greatly benefit from the the chunked Parquet reader working in the manner described above. Currently, we have to go to a lot of work in our GPU based Parquet reader to ensure that we don't try to read more of a Parquet file than we have room to decompress in GPU memory. With this change, that information would be available in the file and no prediction of sizes would be necessary. We would really like to see this implemented! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
