notashes commented on issue #20324: URL: https://github.com/apache/datafusion/issues/20324#issuecomment-3914037886
> I am wondering if for those queries, a conservative heuristic would be also to always put dynamic filters after the static filters (regardless of column size), so the overhead of pushing down bad dynamic filters won't be as bad. It might regress some good TopK predicates though. I actually worked out a solution where instead of putting dynamic filters after static ones we can simply defer the expensive string static predicate out of `RowFilter` entirely. It would also avoid regressing good TopK predicates. The dynamic filter stays and converges on cheap `EventTime` column and gets to prune rows. it's a pretty narrow heuristics as in it only defers `col != literal` on `string/binary` but it shows improvement over both baseline with pushdown off and on. Q24: baseline off `~0.266s`, baseline on `~0.299s` (the regression), with my patch `~0.197s`. @Dandandan would appreciate your thoughts on this #20413 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
