mbutrovich commented on issue #7955: URL: https://github.com/apache/datafusion/issues/7955#issuecomment-2830801221
So now that #15568 is in, what is a reasonable approach to do SIP with bloom filters? - Modify `HashJoinExec` to build a bloom filter on the build side, and when complete call `DynamicFilterPhysicalExpr::update` - How do we represent the bloom filter test as an expression? Is it a new `PhysicalExpr`, or do we define a `ScalarUDF` for `bloom_fiilter_contains(bloom_filter, input)` and use a `ScalarFunctionExpr`? I'm not convinced of the `ScalarUDF` approach for two reasons: 1) taking the bloom filter bytes as an arg would require construction/validation of a bloom filter from those bytes at every invocation, right? 2) I'm not sure we want to expose this as SQL function. I'm interested to start tackling this in pieces next week, but I also suspect others have thoughts (maybe even progress) in this area already. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org