[I] Accurately reserve memory in the build side of hash joins [datafusion]

via GitHub Tue, 09 Jun 2026 15:29:57 -0700


jordepic opened a new issue, #22861:
URL: https://github.com/apache/datafusion/issues/22861


   ### Describe the bug
   
   HashJoinExec's build side reserves get_record_batch_memory_size(&batch) per 
collected batch. That function deduplicates shared buffers only within one 
batch, so when the build input emits zero-copy slices of one larger batch — as 
GroupedHashAggregateStream does when emitting its result in batch_size chunks — 
every slice is charged the full parent allocation. An aggregate output of S 
bytes in n slices reserves n × S for S bytes of physical memory; since the 
build collection cannot spill, this aborts queries that fit in memory with 
large headroom. 
   
   Observed in DataFusion Comet: 26GB reserved for 136MB resident (1.63M-row 
build side, ~200 slices), failing against a 16GiB pool share. Reporting each 
slice's sliced size instead would under-count — a single slice keeps the entire 
parent buffer alive via Arc — so the correct measure for the collection is the 
union of unique buffers it retains.
   
   ### To Reproduce
   
   _No response_
   
   ### Expected behavior
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Accurately reserve memory in the build side of hash joins [datafusion]

Reply via email to