adriangb commented on issue #16188: URL: https://github.com/apache/datafusion/issues/16188#issuecomment-2991542430
> Is there a duplication with push_down_filter in the logical optimizer? I don't think so. The logical optimizer pushes things down into the table scan phase, the physical optimizer rule pushes them down into the actual physical scan, e.g. parquet accepts pushdown but CSV does not (I think). But I hear you! Thinking about it a bit I think what would make the most sense is that the logical optimizer pushes down into the TableScan and then the TableProvider attaches them to the DataSourceExec that it creates. That's kind of how it was before, but it created a lot of duplication and special casing. See https://github.com/apache/datafusion/pull/15769. The main con of this approach is that implementing it correctly is quite cumbersome and makes custom TableProvider's more complex. Having that happen as an optimizer rule keeps the complexity out of the TableProvider. But maybe we should have made it work with a refactor to TableProvider e.g. to return not only an ExecutionPlan but also filter pushdown results or something. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org