Re: [I] Push down entire hash table from HashJoinExec into scans [datafusion]

via GitHub Wed, 17 Sep 2025 23:00:17 -0700


Dandandan commented on issue #17171:
URL: https://github.com/apache/datafusion/issues/17171#issuecomment-3280423699


   Hmm the first source doesn't refer to pushing down a bloom filter, but seems 
to discuss different hash join datastructures (DF uses a combination of the 
hashbrown table implementation and a chained datastructure (which is not always 
optimal, but not relevant here), followed by filtering any hash collisions), 
which I think is not super relevant here for pushing down the join as a filter.
   
   Note that for filtering, we only need the table (i.e. "does the map probably 
contain this key/keys", so it does only need to hash + lookup in the table 
once, similar to a bloom filter. Hashbrown also scans multiple entries at once 
using some bloom filter-like manner (with a low collision rate), which I think 
it would be hard to beat with a bloom filter. We could even use hashbrown as a 
"bloom filter" (skipping internal collision check for potential matches) if 
that turns out to be beneficial.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Push down entire hash table from HashJoinExec into scans [datafusion]

Reply via email to