dqkqd commented on PR #17852: URL: https://github.com/apache/datafusion/pull/17852#issuecomment-3358364829
From https://github.com/apache/datafusion/issues/17770#issuecomment-3358276321 The `scalar_subquery_to_join` changes window aggregate function input schema, causing the optimizer to fail. My initial thought was changing `Window::try_new_with_schema` to `Window::try_new` and let it recompute the output schema. However, this branch runs when nothing should be optimized ( `FoundCommonNodes::No`), using `try_new` here might be costly. https://github.com/apache/datafusion/blob/297e537bdfe0ce74e97cc8fd9875bd1af8f18a25/datafusion/optimizer/src/common_subexpr_eliminate.rs#L223-L227 This change worked: ```DataFusion CLI v50.0.0 > WITH suppliers AS ( SELECT * FROM (VALUES (1, 10.0), (1, 20.0)) AS t(nation, acctbal) ) SELECT ROW_NUMBER() OVER (PARTITION BY nation ORDER BY acctbal DESC) AS rn FROM suppliers AS s WHERE acctbal > ( SELECT AVG(acctbal) FROM suppliers ); +----+ | rn | +----+ | 1 | +----+ 1 row(s) fetched. Elapsed 0.029 seconds. ``` The generated plan: ``` Projection: row_number() PARTITION BY [s.nation] ORDER BY [s.acctbal DESC NULLS FIRST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW AS rn [rn:UInt64] WindowAggr: windowExpr=[[row_number() PARTITION BY [s.nation] ORDER BY [s.acctbal DESC NULLS FIRST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW]] [nation:Int64;N, acctbal:Float64;N, row_number() PARTITION BY [s.nation] ORDER BY [s.acctbal DESC NULLS FIRST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW:UInt64] Projection: s.nation, s.acctbal [nation:Int64;N, acctbal:Float64;N] Inner Join: Filter: s.acctbal > __scalar_sq_1.avg(suppliers.acctbal) [nation:Int64;N, acctbal:Float64;N, avg(suppliers.acctbal):Float64;N] SubqueryAlias: s [nation:Int64;N, acctbal:Float64;N] SubqueryAlias: suppliers [nation:Int64;N, acctbal:Float64;N] SubqueryAlias: t [nation:Int64;N, acctbal:Float64;N] Projection: column1 AS nation, column2 AS acctbal [nation:Int64;N, acctbal:Float64;N] Values: (Int64(1), Float64(10)), (Int64(1), Float64(20)) [column1:Int64;N, column2:Float64;N] SubqueryAlias: __scalar_sq_1 [avg(suppliers.acctbal):Float64;N] Aggregate: groupBy=[[]], aggr=[[avg(suppliers.acctbal)]] [avg(suppliers.acctbal):Float64;N] SubqueryAlias: suppliers [acctbal:Float64;N] SubqueryAlias: t [acctbal:Float64;N] Projection: column2 AS acctbal [acctbal:Float64;N] Values: (Int64(1), Float64(10)), (Int64(1), Float64(20)) [column1:Int64;N, column2:Float64;N] ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
