SAY-5 opened a new pull request, #21646:
URL: https://github.com/apache/datafusion/pull/21646
Closes #21642.
When `PushDownFilter` collapses a parent `Filter` into a child `Filter`, the
combined predicate list was built as
`parents_predicates.chain(child_predicates)`:
```rust
LogicalPlan::Filter(child_filter) => {
let parents_predicates = split_conjunction_owned(filter.predicate);
let child_predicates = split_conjunction_owned(child_filter.predicate);
let new_predicates = parents_predicates
.into_iter()
.chain(child_predicates)
// use IndexSet to remove dupes while preserving predicate order
.collect::<IndexSet<_>>()
...
```
Because the unoptimized plan evaluates inner (child) filters before outer
(parent) filters, this reversed the user-authored execution order of selective
predicates. The reproducer in the issue:
```rust
LogicalPlanBuilder::from(scan)
.filter(col("a").eq(lit(10)))? // applied first (inner)
.filter(col("b").gt(lit(11)))? // applied second (outer)
```
…was optimized into `Filter: b > 11 AND a = 10` rather than the expected
`Filter: a = 10 AND b > 11`. In environments without filter-selectivity
statistics this matters: users who place a cheap or highly-selective filter
first expect it to execute first.
### Fix
Build the conjunction as `child_predicates.chain(parents_predicates)` so
`PushDownFilter` actually preserves the execution order it claims to in the
existing `use IndexSet to remove dupes while preserving predicate order`
comment.
### Snapshot update
`two_filters_on_same_depth` had its `assert_optimized_plan_equal!` snapshot
frozen at the buggy order. The unoptimized plan in that test applies `a <= 1`
first (inner filter) and `a >= 1` second (outer), so the optimized result must
be `a <= 1 AND a >= 1`. Updated the snapshot and added an inline comment
documenting the invariant.
### Verification
`cargo test -p datafusion-optimizer --lib push_down_filter` passes 79/79
tests with the fix and the snapshot update applied.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]