bvolpato opened a new issue, #22629:
URL: https://github.com/apache/datafusion/issues/22629
## Describe the bug
The Substrait logical-plan consumer only creates a `WindowAggr` node when a
`ProjectRel` expression is itself a `WindowFunction`. If a valid Substrait
project expression nests a window function inside a scalar expression, the
window remains inside `Projection` and is not physically plannable.
For example, a `ProjectRel` expression corresponding to:
```sql
SELECT 1 + count(*) OVER () FROM DATA;
```
is consumed as:
```text
Projection: Int64(1) + count(Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING AS EXPR$0
TableScan: DATA
```
rather than introducing a `WindowAggr` node.
## To Reproduce
Add a Substrait fixture whose projected expression is an `add:i64_i64`
scalar function with a nested `count:any` window function, and consume it in
`datafusion/substrait/tests/cases/logical_plans.rs` with execution enabled:
```rust
let plan = from_substrait_plan(&ctx.state(), &proto_plan).await?;
DataFrame::new(ctx.state(), plan).show().await?;
```
Before a fix, a regression snapshot expecting the window node fails with
this difference:
```diff
Projection: Int64(1) + count(Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING AS EXPR$0
- WindowAggr: windowExpr=[[count(Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING]]
- TableScan: DATA
+ TableScan: DATA
```
For non-empty executable plans this shape reaches physical planning with a
nested `WindowFunction` in `Projection`, which physical planning rejects.
## Expected behavior
The Substrait consumer should find window expressions recursively, yielding:
```text
Projection: Int64(1) + count(Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING AS EXPR$0
WindowAggr: windowExpr=[[count(Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING]]
TableScan: DATA
```
and the resulting plan should execute.
## Additional context
`datafusion/sql/src/select.rs` and the DataFrame APIs already call
`find_window_exprs(...)` to collect deeply nested window expressions.
`datafusion/substrait/src/logical_plan/consumer/rel/project_rel.rs` currently
only checks `if let Expr::WindowFunction(_) = &e`, so Substrait consumption
handles only root window expressions. I have a focused patch and executable
fixture/test prepared.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]