mbutrovich commented on issue #7955:
URL: https://github.com/apache/datafusion/issues/7955#issuecomment-2830801221

   So now that #15568 is in, what is a reasonable approach to do SIP with bloom 
filters?
   - Modify `HashJoinExec` to build a bloom filter on the build side, and when 
complete call `DynamicFilterPhysicalExpr::update`
   - How do we represent the bloom filter test as an expression? Is it a new 
`PhysicalExpr`, or do we define a `ScalarUDF` for 
`bloom_fiilter_contains(bloom_filter, input)` and use a `ScalarFunctionExpr`? 
I'm not convinced of the `ScalarUDF` approach for two reasons: 1) taking the 
bloom filter bytes as an arg would require construction/validation of a bloom 
filter from those bytes at every invocation, right? 2) I'm not sure we want to 
expose this as SQL function.
   
   I'm interested to start tackling this in pieces next week, but I also 
suspect others have thoughts (maybe even progress) in this area already.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to