yashrb24 opened a new pull request, #21249:
URL: https://github.com/apache/datafusion/pull/21249

   ## Which issue does this PR close?
   
   - Closes #21248
   
   ## Rationale for this change
   
   `ArrowBytesMap` and `ArrowBytesViewMap` allocate their hash tables with 
`HashTable::with_capacity(INITIAL_MAP_CAPACITY)` but initialize `map_size` to 
`0`. The `insert_accounted` method only tracks incremental growth beyond the 
current capacity, so the initial allocation from `with_capacity` is never 
counted — `size()` understates memory usage until the first resize.
   
   ## What changes are included in this PR?
   
   Initialize `map_size` with `map.allocation_size()` in both 
`ArrowBytesMap::new` and `ArrowBytesViewMap::new` to capture the pre-allocated 
memory.
   
   Two other uses of `map_size: 0` in the codebase (`row.rs` and 
`multi_group_by/mod.rs`) use `HashTable::with_capacity(0)` which allocates 
nothing, so they are already correct and unchanged.
   
   ## Are these changes tested?
   
   The change is a one-line initialization fix. Existing tests cover the 
`size()` method behavior.
   
   ## Are there any user-facing changes?
   
   No API changes. `size()` now returns a more accurate memory estimate that 
includes the initial hash table allocation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to