adriangb commented on code in PR #15566:
URL: https://github.com/apache/datafusion/pull/15566#discussion_r2029878491


##########
datafusion/physical-plan/src/execution_plan.rs:
##########
@@ -467,8 +467,106 @@ pub trait ExecutionPlan: Debug + DisplayAs + Send + Sync {
     ) -> Result<Option<Arc<dyn ExecutionPlan>>> {
         Ok(None)
     }
+
+    /// Returns a set of filters that this operator owns but would like to be 
pushed down.
+    /// For example, a `TopK` operator may produce dynamic filters that 
reference it's currrent state,
+    /// while a `FilterExec` will just hand of the filters it has as is.
+    /// The default implementation returns an empty vector.
+    /// These filters are applied row-by row and any that return `false` or 
`NULL` will be
+    /// filtered out and any that return `true` will be kept.
+    /// The expressions returned **must** always return `true` or `false`;
+    /// other truthy or falsy values are not allowed (e.g. `0`, `1`).
+    ///
+    /// # Returns
+    /// A vector of filters that this operator would like to push down.
+    /// These should be treated as the split conjunction of a `WHERE` clause.
+    /// That is, a query such as `WHERE a = 1 AND b = 2` would return two
+    /// filters: `a = 1` and `b = 2`.
+    /// They can always be assembled into a single filter using
+    /// [`split_conjunction`][datafusion_physical_expr::split_conjunction].
+    fn filters_for_pushdown(&self) -> Result<Vec<Arc<dyn PhysicalExpr>>> {
+        Ok(Vec::new())
+    }
+
+    /// Checks which filters this node allows to be pushed down through it 
from a parent to a child.
+    /// For example, a `ProjectionExec` node can allow filters that only 
refernece
+    /// columns it did not create through but filters that reference columns 
it is creating cannot be pushed down any further.
+    /// That is, it only allows some filters through because it changes the 
schema of the data.
+    /// Aggregation nodes may not allow any filters to be pushed down as they 
change the cardinality of the data.
+    /// RepartitionExec nodes allow all filters to be pushed down as they 
don't change the schema or cardinality.
+    fn filter_pushdown_request(

Review Comment:
   > Instead of two methods (request + rewrite) I wonder if we could simplify 
the API by combining the two following the model of 
ExecutionPlan::try_swapping_with_projection
   
   The reason it's two methods is that we need to recurse into children before 
passing the filters into the current node, but we need to know which filters we 
are allowed to push into children.
   In other words, the between those two methods is when we recurse into the 
children.
   
   I think we could possibly combine `filter_pushdown_request` and 
`filters_for_pushdown` into:
   
   ```rust
   struct FilterPushdownResonse {
     parent_filters: Vec<FilterPushdownAllowed>,
     own_filters: Vec<PhysicalExprRef>,
   }
   
   fn request_filters_for_pushdown(
     &self,
     parent_filters: &[PhysicalExprRef],
   ) -> Result<
   ```
   
   But I don't know that this is much simpler.
   The point is that we need to keep track of which are parent filters and 
which are this node's own filters because we need to transmit that information 
up the recursion.
   
   > Instead of FilterSupport, I think we could also follow the model and 
naming of
   
https://docs.rs/datafusion/latest/datafusion/datasource/provider/enum.TableProviderFilterPushDown.html
   
   I thought about doing this, the reason I didn't go with that is that:
   1. For `FilterPushdownAllowed` it needs to be able to rewrite the filter 
that is going to be pushed down so that projections can adjust the schema to 
their input schema if needed.
   2. For both `FilterPushdownAllowed` and `FilterSupport` there is no 
distinction between `Unhandled` and `Inexact`. To be honest I'm not sure how 
useful that distinction is in `TableProviderFilterPushDown` either, as far as I 
can tell all `Unhandled` gets you is that you don't get called with filters you 
possibly don't know -> you don't have to ignore them.
   
   We can try consolidating but I fear it may end up making things more 
complicated to grok not simpler.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to