Dandandan commented on issue #3463:
URL: https://github.com/apache/datafusion/issues/3463#issuecomment-3841024340

   Thinking about it a bit more, I think the fastest way forward is to disable 
the evaluation of the dynamic filter predicate pushdown in the scan / predicate 
pushdown:
   
   * The evaluation of the big expression `case hashes % partitions when 0 then 
(...) AND lookup when 1 ...` is expensive, even with optimizations, as also the 
branching expressions are different (based on the min/max values in the hash 
maps) -> I think we can reduce this by comparing against the global/combined 
statistics rather than a per-partition statistic and implementing a fast way to 
check a batch of hashes against *n* tables
   * The expression and `contains_hashes` will always have overhead, so we need 
first a smart way to dynamically disable it when it doesn't filter out anything 
/ a lot.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to