Dandandan commented on issue #20324: URL: https://github.com/apache/datafusion/issues/20324#issuecomment-3903351150
Thanks for the long response @adriangb ! That seems like a good way forward. I think w.r.t. slowdowns (this epic) so far I have seen identified 2 issues as regressions: * overhead of filter pushdown on non effective static filters during parquet scan itself (addressed by e.g. https://github.com/apache/arrow-rs/pull/9414 ) * overhead of expression evaluation from dynamic filters (effective or not) during scan. This is a bit more though to totally get rid of as it would require quickly deciding a filter is not worth it. For the purpose of getting in filter pushdown as quickly as possible, I think this seems like the path that would make it most likely we can enable parquet pushdown somewhere soon: 1. Make adaptive filters pruning-only for the moment (behind a flag) and only push down static filters to the parquet reader 2. Enable something like https://github.com/apache/arrow-rs/pull/9414 to get rid of overhead of non effective static filters 3. (... see if there is any other regression) 4. Incrementally improve dynamic/adaptive filters to benefit from the selective ones with low evaluation cost WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
