tgujar opened a new issue, #10337:
URL: https://github.com/apache/datafusion/issues/10337

   ### Describe the bug
   
   Currently Datafusion will inline all CTE, a non-deterministic expression can 
be executed multiple times producing different results
   
   
   ### To Reproduce
   
   Consider the following query which uses the `aggregate_test_100` data from 
datafusion-examples. Here, column c11 is a Float64 
   ```
   WITH cte as (
       SELECT sum(c4 * c11) as total 
       FROM aggregate_test_100 
       GROUP BY c1) 
   SELECT total 
   FROM cte 
   WHERE total = (select max(total) from cte)
   ```
   
   The optimized plan generated will inline the CTE and thus execute it twice
   ```
   Projection: cte.total
     Inner Join: cte.total = __scalar_sq_1.MAX(cte.total)
       SubqueryAlias: cte
         Projection: SUM(aggregate_test_100.c4 * aggregate_test_100.c11) AS 
total
           Aggregate: groupBy=[[aggregate_test_100.c1]], 
aggr=[[SUM(CAST(aggregate_test_100.c4 AS Float64) * aggregate_test_100.c11)]]
             TableScan: aggregate_test_100 projection=[c1, c4, c11]
       SubqueryAlias: __scalar_sq_1
         Aggregate: groupBy=[[]], aggr=[[MAX(cte.total)]]
           SubqueryAlias: cte
             Projection: SUM(aggregate_test_100.c4 * aggregate_test_100.c11) AS 
total
               Aggregate: groupBy=[[aggregate_test_100.c1]], 
aggr=[[SUM(CAST(aggregate_test_100.c4 AS Float64) * aggregate_test_100.c11)]]
                 TableScan: aggregate_test_100 projection=[c1, c4, c11]
   ```
   
   ### Expected behavior
   
   Since summation here is dependent on ordering, I believe it is incorrect to 
inline the CTE here and execute it more than once.
   
   ### Additional context
   
   Related issue, which talks about possible advantages on not inlining CTE in 
some cases: https://github.com/apache/datafusion/issues/8777
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to