Dandandan commented on issue #17171: URL: https://github.com/apache/datafusion/issues/17171#issuecomment-3280423699
Hmm the first source doesn't refer to pushing down a bloom filter, but seems to discuss different hash join datastructures (DF uses a combination of the hashbrown table implementation and a chained datastructure (which is not always optimal, but not relevant here), followed by filtering any hash collisions), which I think is not super relevant here for pushing down the join as a filter. Note that for filtering, we only need the table (i.e. "does the map probably contain this key/keys", so it does only need to hash + lookup in the table once, similar to a bloom filter. Hashbrown also scans multiple entries at once using some bloom filter-like manner (with a low collision rate), which I think it would be hard to beat with a bloom filter. We could even use hashbrown as a "bloom filter" (skipping internal collision check for potential matches) if that turns out to be beneficial. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org