Dandandan commented on issue #3463: URL: https://github.com/apache/datafusion/issues/3463#issuecomment-3839311050
I think I have mostly traced down the slowdown of TPCH and (dynamic) filter pushdown: * Dynamic filter pushdown creates a long `case when` expression based on the number of target partitions (~75% or more of the overhead), where the evaluation cost grows with the number of partitions. Evaluating this expression could be optimized to use direct indexing to make overhead really small (I'll be creating a PR later this week for this, currently fighting some illness). * `contains_hashes` is relatively expensive if nothing can be filtered out. I think we should disable the hash table push down by default for now (only enable the direct indexing ArrayMap if no slowdown happens there) and implement some better heuristics to enable / disable the hash table lookup (based on some sampling?) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
