adriangb commented on code in PR #15301:
URL: https://github.com/apache/datafusion/pull/15301#discussion_r2021713556


##########
datafusion/physical-plan/src/filter.rs:
##########
@@ -433,6 +433,22 @@ impl ExecutionPlan for FilterExec {
         }
         try_embed_projection(projection, self)
     }
+
+    fn push_down_filter(
+        &self,
+        expr: Arc<dyn PhysicalExpr>,
+    ) -> Result<Option<Arc<dyn ExecutionPlan>>> {
+        let mut input = Arc::clone(&self.input);
+        if let Some(new_input) = input.push_down_filter(Arc::clone(&expr))? {

Review Comment:
   I think we may need to think of this in the context of the relationship 
between `TableProvider`, `FilterExec` and the existing predicate pushdown into 
the `ParquetSource`. So maybe what we need is:
   
   ```
   trait ExecuionPlan {
    ...
   
     fn push_down_filters(
       &self,
       filters: &[&Arc<dyn PhysicalExpr>],
     ) -> Result<Option<ExecutionPlanFilterPushdownResult>> { Ok(None) }
   }
   
   struct ExecutionPlanFilterPushdownResult {
     new_plan: Arc<dyn ExecutionPlan>,
     filter_pushdown: Vec<TableProviderFilterPushDown>,  // maybe rename
   }
   ```
   
   1. At planning time `TableProvider` tells you which filters it can handle 
itself instead of which filters it or the ParquetSource it is creating can 
handle. This might eliminate a partition column filter that can be evaluated 
completely at planning time: it gets marked as exact and never run again. 
Inexact or unsupported filters result in the creation of a `FilterExec`. The 
big change is that the `TableProvider` does _not_ have to explicitly pass the 
filters into the `ParquetSource`!
   2. Later (optimizer pass?) `FilterExec::push_down_filters` gets called. 
`FilterExec` calls it's child which ultimately hits the `ParquetSource` which 
returns all Exacts\*. This travels back up the chain and if the `FilterExec` 
receives back `Exact` it returns it's children instead of itself.
   
   \* I think, but am not sure, that what we have right now is not totally 
right: there are [filters that the parquet exec pushdown cannot 
handle](https://github.com/pydantic/datafusion/blob/74813118d229922b948672c876fbbb94861649a7/datafusion/datasource-parquet/src/row_filter.rs#L336),
 for reasons I don't fully understand. This new API would allow it to reject 
those as unsupported -> those queries would not break.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to