dqkqd commented on PR #17852:
URL: https://github.com/apache/datafusion/pull/17852#issuecomment-3358364829

   From 
https://github.com/apache/datafusion/issues/17770#issuecomment-3358276321
   The `scalar_subquery_to_join` changes window aggregate function input 
schema, causing the optimizer to fail.
   
   My initial thought was changing `Window::try_new_with_schema` to 
`Window::try_new` and let it recompute the output schema.
   However, this branch runs when nothing should be optimized ( 
`FoundCommonNodes::No`), using `try_new` here might be costly.
   
   
https://github.com/apache/datafusion/blob/297e537bdfe0ce74e97cc8fd9875bd1af8f18a25/datafusion/optimizer/src/common_subexpr_eliminate.rs#L223-L227
   
   This change worked:
   
   ```DataFusion CLI v50.0.0
   > WITH suppliers AS (
     SELECT *
     FROM (VALUES (1, 10.0), (1, 20.0)) AS t(nation, acctbal)
   )
   SELECT
     ROW_NUMBER() OVER (PARTITION BY nation ORDER BY acctbal DESC) AS rn
   FROM suppliers AS s
   WHERE acctbal > (
     SELECT AVG(acctbal) FROM suppliers
   );
   +----+
   | rn |
   +----+
   | 1  |
   +----+
   1 row(s) fetched.
   Elapsed 0.029 seconds.
   ```
   
   The generated plan:
   ```
       Projection: row_number() PARTITION BY [s.nation] ORDER BY [s.acctbal 
DESC NULLS FIRST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW AS rn 
[rn:UInt64]
         WindowAggr: windowExpr=[[row_number() PARTITION BY [s.nation] ORDER BY 
[s.acctbal DESC NULLS FIRST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT 
ROW]] [nation:Int64;N, acctbal:Float64;N, row_number() PARTITION BY [s.nation] 
ORDER BY [s.acctbal DESC NULLS FIRST] RANGE BETWEEN UNBOUNDED PRECEDING AND 
CURRENT ROW:UInt64]
           Projection: s.nation, s.acctbal [nation:Int64;N, acctbal:Float64;N]
             Inner Join:  Filter: s.acctbal > 
__scalar_sq_1.avg(suppliers.acctbal) [nation:Int64;N, acctbal:Float64;N, 
avg(suppliers.acctbal):Float64;N]
               SubqueryAlias: s [nation:Int64;N, acctbal:Float64;N]
                 SubqueryAlias: suppliers [nation:Int64;N, acctbal:Float64;N]
                   SubqueryAlias: t [nation:Int64;N, acctbal:Float64;N]
                     Projection: column1 AS nation, column2 AS acctbal 
[nation:Int64;N, acctbal:Float64;N]
                       Values: (Int64(1), Float64(10)), (Int64(1), Float64(20)) 
[column1:Int64;N, column2:Float64;N]
               SubqueryAlias: __scalar_sq_1 [avg(suppliers.acctbal):Float64;N]
                 Aggregate: groupBy=[[]], aggr=[[avg(suppliers.acctbal)]] 
[avg(suppliers.acctbal):Float64;N]
                   SubqueryAlias: suppliers [acctbal:Float64;N]
                     SubqueryAlias: t [acctbal:Float64;N]
                       Projection: column2 AS acctbal [acctbal:Float64;N]
                         Values: (Int64(1), Float64(10)), (Int64(1), 
Float64(20)) [column1:Int64;N, column2:Float64;N]
   ``` 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to