adriangb commented on code in PR #20117:
URL: https://github.com/apache/datafusion/pull/20117#discussion_r2760298530
##########
datafusion/optimizer/src/push_down_filter.rs:
##########
@@ -1295,6 +1297,18 @@ fn rewrite_projection(
predicates: Vec<Expr>,
mut projection: Projection,
) -> Result<(Transformed<LogicalPlan>, Option<Expr>)> {
+ // Note: This check coordinates with ExtractLeafExpressions optimizer rule.
+ // See extract_leaf_expressions.rs for details on why these projections
exist.
+ // Don't push filters through extracted expression projections.
+ // Pushing filters through would rewrite expressions like
`__datafusion_extracted_1 > 150`
+ // back to `get_field(s,'value') > 150`, undoing the extraction.
+ if is_extracted_expr_projection(&projection) {
Review Comment:
This is obviously not great, but I don't see another way to avoid this.
Otherwise if we have:
```
Filter: get_field(col, 'foo') > 1
TableScan: projection=[col]
```
And we run our new rule to get:
```
Projection: col('col')
Filter: __datafusion_extracted_1 > 1
Projection: get_field(col, 'foo') as __datafusion_extracted_1, col
TableScan: projection=[col]
```
Then this rule runs and will produce:
```
Projection: col('col')
Projection: get_field(col, 'foo') as __datafusion_extracted_1, col
Filter: get_field(col, 'foo') > 1
TableScan projection=[col]
```
Because it wants to push the filter under the projection.
I'd argue as a general rule there's no point in pushing a filter under a
projection that is purely column selections / get_field expressions especially
if we can't then push it further.
Maybe a more robust fix would be to have the filter pushdown optimizer rule
traverse the rest of the plan, find the position it plans to push into and then
check if there's any advantage to doing some (i.e. is it pushing the filter
under an expensive operator that benefits from less input data, or is it just
doing a trivial pointless pushdown like in the case above). But that would be a
lot more involved so I chose this simpler solution for now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]