Re: [PR] Hash join buffering on probe side [datafusion]

via GitHub Mon, 02 Mar 2026 23:01:51 -0800


gabotechs commented on PR #19761:
URL: https://github.com/apache/datafusion/pull/19761#issuecomment-3989100294


   > The tradeoff here is reapplying filters to in-memory batches vs getting 
filtered I/O from the start, but having to wait for it. This should be fine in 
most cases: the buffer fill happens concurrently with the build so the 
unfiltered I/O costs no wall clock time, evaluating a filter on in-memory 
batches is cheap compared to storage I/O, and the buffer is bounded by 
hash_join_buffering_capacity so the unfiltered portion is small.
   
   This sounds complex to do, but I can imagine it yielding good results. I'd 
suggest exploring this in a follow up PR rather than in this one though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Hash join buffering on probe side [datafusion]

Reply via email to