adamreeve commented on PR #8671: URL: https://github.com/apache/arrow-rs/pull/8671#issuecomment-3440325242
> `SchemaDescriptor` is already counting the heap for the tree of `Type` pointers, but then each `ColumnDescriptor` is also counting the same objects. Perhaps the impl for `ColumnDescriptor` should be more like `self.path.heap_size() + 2 * std::mem::size_of::<usize>()` 🤷 I was going to comment that the `ColumnDescriptor`'s themselves are also referenced from `file_metadata.schema_descr.leaves` as well as `row_groups[rg].schema_descr` and `row_groups[rg].columns[c].column_desc`. But then I saw that this is already accounted for: https://github.com/apache/arrow-rs/blob/d519bb800340fa1a5e2601ae51cba82be3a7aa4b/parquet/src/file/metadata/memory.rs#L101-L103 https://github.com/apache/arrow-rs/blob/d519bb800340fa1a5e2601ae51cba82be3a7aa4b/parquet/src/file/metadata/memory.rs#L115-L116 So applying a similar solution to prevent duplicate accounting of the `Type` pointers probably makes sense. It's expanding the scope of this PR a little, but it's a pretty small change so I think it's fine to add here. I think the impl should only be `self.path.heap_size()` though, the sizes of the pointers will be accounted for in `size_of::<ColumnDescriptor>`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
