yashrb24 opened a new pull request, #21249: URL: https://github.com/apache/datafusion/pull/21249
## Which issue does this PR close? - Closes #21248 ## Rationale for this change `ArrowBytesMap` and `ArrowBytesViewMap` allocate their hash tables with `HashTable::with_capacity(INITIAL_MAP_CAPACITY)` but initialize `map_size` to `0`. The `insert_accounted` method only tracks incremental growth beyond the current capacity, so the initial allocation from `with_capacity` is never counted — `size()` understates memory usage until the first resize. ## What changes are included in this PR? Initialize `map_size` with `map.allocation_size()` in both `ArrowBytesMap::new` and `ArrowBytesViewMap::new` to capture the pre-allocated memory. Two other uses of `map_size: 0` in the codebase (`row.rs` and `multi_group_by/mod.rs`) use `HashTable::with_capacity(0)` which allocates nothing, so they are already correct and unchanged. ## Are these changes tested? The change is a one-line initialization fix. Existing tests cover the `size()` method behavior. ## Are there any user-facing changes? No API changes. `size()` now returns a more accurate memory estimate that includes the initial hash table allocation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
