Dandandan commented on issue #3463: URL: https://github.com/apache/datafusion/issues/3463#issuecomment-3841024340
Thinking about it a bit more, I think the fastest way forward is to disable the evaluation of the dynamic filter predicate pushdown in the scan / predicate pushdown: * The evaluation of the big expression `case hashes % partitions when 0 then (...) AND lookup when 1 ...` is expensive, even with optimizations, as also the branching expressions are different (based on the min/max values in the hash maps) -> I think we can reduce this by comparing against the global/combined statistics rather than a per-partition statistic and implementing a fast way to check a batch of hashes against *n* tables * The expression and `contains_hashes` will always have overhead, so we need first a smart way to dynamically disable it when it doesn't filter out anything / a lot. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
