adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2021713556
########## datafusion/physical-plan/src/filter.rs: ########## @@ -433,6 +433,22 @@ impl ExecutionPlan for FilterExec { } try_embed_projection(projection, self) } + + fn push_down_filter( + &self, + expr: Arc<dyn PhysicalExpr>, + ) -> Result<Option<Arc<dyn ExecutionPlan>>> { + let mut input = Arc::clone(&self.input); + if let Some(new_input) = input.push_down_filter(Arc::clone(&expr))? { Review Comment: I think we may need to think of this in the context of the relationship between `TableProvider`, `FilterExec` and the existing predicate pushdown into the `ParquetSource`. So maybe what we need is: ``` trait ExecuionPlan { ... fn push_down_filters( &self, filters: &[&Arc<dyn PhysicalExpr>], ) -> Result<Option<ExecutionPlanFilterPushdownResult>> { Ok(None) } } struct ExecutionPlanFilterPushdownResult { new_plan: Arc<dyn ExecutionPlan>, filter_pushdown: Vec<TableProviderFilterPushDown>, // maybe rename } ``` 1. At planning time `TableProvider` tells you which filters it can handle itself instead of which filters it or the ParquetSource it is creating can handle. This might eliminate a partition column filter that can be evaluated completely at planning time: it gets marked as exact and never run again. Inexact or unsupported filters result in the creation of a `FilterExec`. The big change is that the `TableProvider` does _not_ have to explicitly pass the filters into the `ParquetSource`! 2. Later (optimizer pass?) `FilterExec::push_down_filters` gets called. `FilterExec` calls it's child which ultimately hits the `ParquetSource` which returns all Exacts\*. This travels back up the chain and if the `FilterExec` receives back `Exact` it returns it's children instead of itself. \* I think, but am not sure, that what we have right now is not totally right: there are [filters that the parquet exec pushdown cannot handle](https://github.com/pydantic/datafusion/blob/74813118d229922b948672c876fbbb94861649a7/datafusion/datasource-parquet/src/row_filter.rs#L336), for reasons I don't fully understand. This new API would allow it to reject those as unsupported -> those queries would not break. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org