tgujar opened a new issue, #10337: URL: https://github.com/apache/datafusion/issues/10337
### Describe the bug Currently Datafusion will inline all CTE, a non-deterministic expression can be executed multiple times producing different results ### To Reproduce Consider the following query which uses the `aggregate_test_100` data from datafusion-examples. Here, column c11 is a Float64 ``` WITH cte as ( SELECT sum(c4 * c11) as total FROM aggregate_test_100 GROUP BY c1) SELECT total FROM cte WHERE total = (select max(total) from cte) ``` The optimized plan generated will inline the CTE and thus execute it twice ``` Projection: cte.total Inner Join: cte.total = __scalar_sq_1.MAX(cte.total) SubqueryAlias: cte Projection: SUM(aggregate_test_100.c4 * aggregate_test_100.c11) AS total Aggregate: groupBy=[[aggregate_test_100.c1]], aggr=[[SUM(CAST(aggregate_test_100.c4 AS Float64) * aggregate_test_100.c11)]] TableScan: aggregate_test_100 projection=[c1, c4, c11] SubqueryAlias: __scalar_sq_1 Aggregate: groupBy=[[]], aggr=[[MAX(cte.total)]] SubqueryAlias: cte Projection: SUM(aggregate_test_100.c4 * aggregate_test_100.c11) AS total Aggregate: groupBy=[[aggregate_test_100.c1]], aggr=[[SUM(CAST(aggregate_test_100.c4 AS Float64) * aggregate_test_100.c11)]] TableScan: aggregate_test_100 projection=[c1, c4, c11] ``` ### Expected behavior Since summation here is dependent on ordering, I believe it is incorrect to inline the CTE here and execute it more than once. ### Additional context Related issue, which talks about possible advantages on not inlining CTE in some cases: https://github.com/apache/datafusion/issues/8777 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org