Re: [PR] [Parquet] Account for FileDecryptor in ParquetMetaData heap size calculation [arrow-rs]

via GitHub Thu, 23 Oct 2025 18:40:49 -0700


adamreeve commented on PR #8671:
URL: https://github.com/apache/arrow-rs/pull/8671#issuecomment-3440325242


   > `SchemaDescriptor` is already counting the heap for the tree of `Type` 
pointers, but then each `ColumnDescriptor` is also counting the same objects. 
Perhaps the impl for `ColumnDescriptor` should be more like 
`self.path.heap_size() + 2 * std::mem::size_of::<usize>()` 🤷
   
   I was going to comment that the `ColumnDescriptor`'s themselves are also 
referenced from `file_metadata.schema_descr.leaves` as well as 
`row_groups[rg].schema_descr` and `row_groups[rg].columns[c].column_desc`. But 
then I saw that this is already accounted for:
   
   
https://github.com/apache/arrow-rs/blob/d519bb800340fa1a5e2601ae51cba82be3a7aa4b/parquet/src/file/metadata/memory.rs#L101-L103
   
   
https://github.com/apache/arrow-rs/blob/d519bb800340fa1a5e2601ae51cba82be3a7aa4b/parquet/src/file/metadata/memory.rs#L115-L116
   
   So applying a similar solution to prevent duplicate accounting of the `Type` 
pointers probably makes sense. It's expanding the scope of this PR a little, 
but it's a pretty small change so I think it's fine to add here. I think the 
impl should only be `self.path.heap_size()` though, the sizes of the pointers 
will be accounted for in `size_of::<ColumnDescriptor>`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [Parquet] Account for FileDecryptor in ParquetMetaData heap size calculation [arrow-rs]

Reply via email to