[PR] [Parquet] Account for FileDecryptor in ParquetMetaData heap size calculation [arrow-rs]

via GitHub Mon, 20 Oct 2025 19:30:30 -0700


adamreeve opened a new pull request, #8671:
URL: https://github.com/apache/arrow-rs/pull/8671


   # Which issue does this PR close?
   
   - Closes #8472.
   
   # Rationale for this change
   
   Makes the metadata heap size calculation more accurate when reading 
encrypted Parquet files, which helps to better manage caches of Parquet 
metadata.
   
   # What changes are included in this PR?
   
   * Accounts for heap allocations related to the `FileDecryptor` in 
`ParquetMetaData`
   * Does not account for any user-provided `KeyRetriever`
   
   # Are these changes tested?
   
   Yes, there's a new unit test added that computes the heap size with a 
decryptor.
   
   I also did a manual test that created a test Parquet file with 100 columns 
using per-column encryption keys, and loaded 10,000 copies of the 
`ParquetMetaData` into a vector. `heaptrack` reported 1.1 GB memory heap 
allocated in this test program. Prior to this change, the sum of the metadata 
was reported as 879.2 MB, and afterwards it was 952.6 MB.
   
   I'm not sure if there's any better way to test the accuracy of this 
calculation?
   
   # Are there any user-facing changes?
   
   No
   
   This was co-authored by @etseidl. I haven't changed their original 
implementation much beyond adding a test and some comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [Parquet] Account for FileDecryptor in ParquetMetaData heap size calculation [arrow-rs]

Reply via email to