adriangb commented on issue #16188:
URL: https://github.com/apache/datafusion/issues/16188#issuecomment-2991542430

   > Is there a duplication with push_down_filter in the logical optimizer?
   
   I don't think so. The logical optimizer pushes things down into the table 
scan phase, the physical optimizer rule pushes them down into the actual 
physical scan, e.g. parquet accepts pushdown but CSV does not (I think).
   
   But I hear you!
   
   Thinking about it a bit I think what would make the most sense is that the 
logical optimizer pushes down into the TableScan and then the TableProvider 
attaches them to the DataSourceExec that it creates. That's kind of how it was 
before, but it created a lot of duplication and special casing. See 
https://github.com/apache/datafusion/pull/15769. The main con of this approach 
is that implementing it correctly is quite cumbersome and makes custom 
TableProvider's more complex. Having that happen as an optimizer rule keeps the 
complexity out of the TableProvider. But maybe we should have made it work with 
a refactor to TableProvider e.g. to return not only an ExecutionPlan but also 
filter pushdown results or something.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to