EeshanBembi opened a new pull request, #20231:
URL: https://github.com/apache/datafusion/pull/20231

   ## Which issue does this PR close?
   
   Closes #20194
   
   ## Rationale for this change
   
   A query with `ROW_NUMBER() OVER (... ORDER BY CASE WHEN col='0' THEN 1 ELSE 
0 END)` combined with a filter `nvl(t2.value_2_3,'0')='0'` fails with a 
`SanityCheckPlan` error. This worked in 50.3.0 but broke in 52.1.0.
   
   ## What changes are included in this PR?
   
   **Root cause**: `collect_columns_from_predicate_inner` was extracting 
equality pairs where neither side was a `Column` (e.g. `nvl(col, '0') = '0'`), 
creating equivalence classes between complex expressions and literals. 
`normalize_expr`'s deep traversal would then replace the literal `'0'` inside 
unrelated sort/window CASE WHEN expressions with the complex NVL expression, 
corrupting the sort ordering and causing a mismatch between `SortExec`'s 
reported output ordering and `BoundedWindowAggExec`'s expected ordering.
   
   **Fix** (two changes in `filter.rs`):
   1. **`collect_columns_from_predicate_inner`**: Only extract equality pairs 
where at least one side is a `Column` reference. This matches the function's 
documented intent ("Column-Pairs") and prevents complex-expression-to-literal 
equivalence classes from being created.
   2. **`extend_constants`**: Recognize `Literal` expressions as inherently 
constant (previously only checked `is_expr_constant` on the input's equivalence 
properties, which doesn't know about literals). This ensures constant 
propagation still works for `complex_expr = literal` predicates — e.g. 
`nvl(col, '0')` is properly marked as constant after the filter.
   
   ## How was this tested?
   
   - Unit test `test_collect_columns_skips_non_column_pairs` verifying the 
filtering logic
   - Sqllogictest reproducing the exact query from the issue
   - Full test suites: equivalence tests (51 passed), physical-plan tests (1255 
passed), physical-optimizer tests (20 passed)
   - Manual verification with datafusion-cli running the reproduction query
   
   ## Test plan
   - [x] Unit test for `collect_columns_from_predicate_inner` column filtering
   - [x] Sqllogictest regression test for #20194
   - [x] Existing test suites pass
   - [x] Manual reproduction query succeeds


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to