adamreeve opened a new pull request, #8671: URL: https://github.com/apache/arrow-rs/pull/8671
# Which issue does this PR close? - Closes #8472. # Rationale for this change Makes the metadata heap size calculation more accurate when reading encrypted Parquet files, which helps to better manage caches of Parquet metadata. # What changes are included in this PR? * Accounts for heap allocations related to the `FileDecryptor` in `ParquetMetaData` * Does not account for any user-provided `KeyRetriever` # Are these changes tested? Yes, there's a new unit test added that computes the heap size with a decryptor. I also did a manual test that created a test Parquet file with 100 columns using per-column encryption keys, and loaded 10,000 copies of the `ParquetMetaData` into a vector. `heaptrack` reported 1.1 GB memory heap allocated in this test program. Prior to this change, the sum of the metadata was reported as 879.2 MB, and afterwards it was 952.6 MB. I'm not sure if there's any better way to test the accuracy of this calculation? # Are there any user-facing changes? No This was co-authored by @etseidl. I haven't changed their original implementation much beyond adding a test and some comments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
