Dandandan commented on issue #3463:
URL: https://github.com/apache/datafusion/issues/3463#issuecomment-3839311050

   I think I have mostly traced down the slowdown of TPCH and (dynamic) filter 
pushdown:
   
   * Dynamic filter pushdown creates a long `case when` expression based on the 
number of target partitions (~75% or more of the overhead), where the 
evaluation cost grows with the number of partitions.
   Evaluating this expression could be optimized to use direct indexing to make 
overhead really small (I'll be creating a PR later this week for this, 
currently fighting some illness).
   * `contains_hashes` is relatively expensive if nothing can be filtered out. 
I think we should disable the hash table push down by default for now (only 
enable the direct indexing ArrayMap if no slowdown happens there) and implement 
some better heuristics to enable / disable the hash table lookup (based on some 
sampling?)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to