bvolpato opened a new issue, #22629:
URL: https://github.com/apache/datafusion/issues/22629

   ## Describe the bug
   
   The Substrait logical-plan consumer only creates a `WindowAggr` node when a 
`ProjectRel` expression is itself a `WindowFunction`. If a valid Substrait 
project expression nests a window function inside a scalar expression, the 
window remains inside `Projection` and is not physically plannable.
   
   For example, a `ProjectRel` expression corresponding to:
   
   ```sql
   SELECT 1 + count(*) OVER () FROM DATA;
   ```
   
   is consumed as:
   
   ```text
   Projection: Int64(1) + count(Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING AND 
UNBOUNDED FOLLOWING AS EXPR$0
     TableScan: DATA
   ```
   
   rather than introducing a `WindowAggr` node.
   
   ## To Reproduce
   
   Add a Substrait fixture whose projected expression is an `add:i64_i64` 
scalar function with a nested `count:any` window function, and consume it in 
`datafusion/substrait/tests/cases/logical_plans.rs` with execution enabled:
   
   ```rust
   let plan = from_substrait_plan(&ctx.state(), &proto_plan).await?;
   DataFrame::new(ctx.state(), plan).show().await?;
   ```
   
   Before a fix, a regression snapshot expecting the window node fails with 
this difference:
   
   ```diff
    Projection: Int64(1) + count(Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING AND 
UNBOUNDED FOLLOWING AS EXPR$0
   -  WindowAggr: windowExpr=[[count(Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING 
AND UNBOUNDED FOLLOWING]]
   -    TableScan: DATA
   +  TableScan: DATA
   ```
   
   For non-empty executable plans this shape reaches physical planning with a 
nested `WindowFunction` in `Projection`, which physical planning rejects.
   
   ## Expected behavior
   
   The Substrait consumer should find window expressions recursively, yielding:
   
   ```text
   Projection: Int64(1) + count(Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING AND 
UNBOUNDED FOLLOWING AS EXPR$0
     WindowAggr: windowExpr=[[count(Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING 
AND UNBOUNDED FOLLOWING]]
       TableScan: DATA
   ```
   
   and the resulting plan should execute.
   
   ## Additional context
   
   `datafusion/sql/src/select.rs` and the DataFrame APIs already call 
`find_window_exprs(...)` to collect deeply nested window expressions. 
`datafusion/substrait/src/logical_plan/consumer/rel/project_rel.rs` currently 
only checks `if let Expr::WindowFunction(_) = &e`, so Substrait consumption 
handles only root window expressions. I have a focused patch and executable 
fixture/test prepared.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to