bvolpato opened a new pull request, #22630:
URL: https://github.com/apache/datafusion/pull/22630
## Which issue does this PR close?
- Closes #22629.
## Rationale for this change
The Substrait `ProjectRel` consumer only added a `WindowAggr` relation when
the root projected expression was a `WindowFunction`. A scalar expression
wrapping a valid window function therefore left the window directly inside
`Projection`, which cannot be physically planned.
Minimal reproducer represented by the added Substrait fixture:
```sql
SELECT 1 + count(*) OVER () FROM DATA;
```
Before this patch, the regression test produced this plan difference:
```diff
Projection: Int64(1) + count(Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING AS EXPR$0
- WindowAggr: windowExpr=[[count(Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING]]
- TableScan: DATA
+ TableScan: DATA
```
This is the same class of failure that reaches physical planning as an
unsupported nested `WindowFunction` expression.
## What changes are included in this PR?
- Use existing `find_window_exprs(...)` recursion while consuming Substrait
`ProjectRel` expressions, retaining current `HashSet` deduplication across
projections.
- Add a minimal Substrait JSON fixture with a window function nested inside
an arithmetic scalar expression.
- Add a logical-plan snapshot plus execution regression, proving
`WindowAggr` is inserted and physically executable.
## Are these changes tested?
Pre-fix evidence:
- `cargo test -p datafusion-substrait --test substrait_integration
nested_window_function_in_expression -- --nocapture` failed because the
consumed plan omitted expected `WindowAggr` and put `Projection` directly above
`TableScan`.
Final validation on Apache `main` commit `d8c458828`:
- `cargo fmt --all -- --check`
- `cargo test -p datafusion-substrait` (49 unit passed, 200 integration
passed, 3 doctests passed; 6 existing ignored)
- `cargo check --all-targets -p datafusion-substrait`
- `cargo check --no-default-features -p datafusion-substrait`
- `cargo check --no-default-features -p datafusion-substrait
--features=physical`
- `cargo check --no-default-features -p datafusion-substrait
--features=protoc`
- `cargo clippy --all-targets --all-features -- -D warnings`
- `./dev/rust_lint.sh`
## Are there any user-facing changes?
Substrait producers may now send projected expressions that contain nested
window functions; DataFusion consumes them into executable logical plans
instead of leaving unsupported window expressions in projections.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]