Dandandan commented on issue #20324:
URL: https://github.com/apache/datafusion/issues/20324#issuecomment-3903351150

   Thanks for the long response @adriangb ! That seems like a good way forward.
   
   I think w.r.t. slowdowns (this epic) so far I have seen identified 2 issues 
as regressions:
   
   * overhead of filter pushdown on non effective static filters during parquet 
scan itself (addressed by e.g. https://github.com/apache/arrow-rs/pull/9414 )
   * overhead of expression evaluation from dynamic filters (effective or not) 
during scan. This is a bit more though to totally get rid of as it would 
require quickly deciding a filter is not worth it.
   
   For the purpose of getting in filter pushdown as quickly as possible, I 
think this seems like the  path that would make it most likely we can enable 
parquet pushdown somewhere soon:
   
   1. Make adaptive filters pruning-only for the moment (behind a flag) and 
only push down static filters to the parquet reader
   2. Enable something like https://github.com/apache/arrow-rs/pull/9414 to get 
rid of overhead of non effective static filters
   3. (... see if there is any other regression)
   4. Incrementally improve dynamic/adaptive filters to benefit from the 
selective ones with low evaluation cost
   
   
   WDYT?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to