discord9 opened a new issue, #22925:
URL: https://github.com/apache/datafusion/issues/22925
### Describe the bug
`AggregateExec::gather_filters_for_pushdown` can reorder the parent filter
results it returns to the filter pushdown optimizer.
The filter pushdown optimizer maps child pushdown results back to parent
filters by position. `AggregateExec` currently splits incoming filters into
`safe_filters` and `unsafe_filters`, builds the child filter description from
the safe filters, then appends the unsupported unsafe filters. For mixed
filters this changes the result order.
For example, with a filter above an aggregate such as:
```text
cnt@2 = 1 AND b@1 = bar
```
where `cnt` is an aggregate output and `b` is a grouping column:
- `b@1 = bar` is safe to push below the aggregate
- `cnt@2 = 1` must remain above the aggregate
Because the results are reordered, the optimizer can interpret the
pushed-down grouping-column filter result as belonging to the aggregate-output
filter. The aggregate-output filter can then be removed incorrectly, while the
already pushed-down grouping-column filter remains above the aggregate.
### To Reproduce
Add a regression test with a mixed predicate above `AggregateExec`:
```text
FilterExec: cnt@2 = 1 AND b@1 = bar
AggregateExec: mode=Final, gby=[a@0 as a, b@1 as b], aggr=[cnt]
DataSourceExec: ...
```
The expected optimized plan should keep only `cnt@2 = 1` above the aggregate
and push `b@1 = bar` into the scan.
On current `main`, the resulting plan keeps `b@1 = bar` above the aggregate
instead.
### Expected behavior
`AggregateExec::gather_filters_for_pushdown` should preserve the order of
`parent_filters` in its returned parent filter results, marking unsupported
filters in place rather than moving them to the end.
### Additional context
This affects correctness for mixed aggregate-output and grouping-column
predicates during physical filter pushdown.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]