peter-toth commented on code in PR #10868:
URL: https://github.com/apache/datafusion/pull/10868#discussion_r1635129770


##########
datafusion/optimizer/src/common_subexpr_eliminate.rs:
##########
@@ -515,14 +522,15 @@ fn build_common_expr_project_plan(
     let mut fields_set = BTreeSet::new();
     let mut project_exprs = common_exprs
         .into_iter()
-        .map(|(expr_id, expr)| {
+        .enumerate()
+        .map(|(index, (expr_id, expr))| {

Review Comment:
   So I was thinking of a following example:
   ```
   Projection: (a + b) as "c1", (a + b) as "c2", (a + 2) as "c3", (a + 1 + 1) 
as "c4"
   ```
   When this query gets optimized it might be the case that 
`CommonSubexprEliminate` runs first and extracts `a + b` to `#0` so the plan 
becomes:
   ```
   Projection: #0 as "c1", #0 as "c2", (a + 2) as "c3", (a + 1 + 1) as "c4"
     Projection: a + b as #0, a
   ```
   But then some other rule can do contant folding so the plan becomes:
   ```
   Projection: #0 as "c1", #0 as "c2", (a + 2) as "c3", (a + 2) as "c4" 
     Projection: a + b as #0, a
   ```
   And then in the 2nd optimizer cycle `CommonSubexprEliminate` runs again and 
wants to extract `a + 2`, but as far as I see your PR uses the same 0 based 
aliases again in the 2nd+ runs so the plan would become invalid:
   ```
   Projection: #0 as "c1", #0 as "c2", #0 as "c3", #0 as "c4"
     Projection: a + 2 as "#0", #0  // Here the 2nd rule run starts to use #0 
alias again and the alias collides with the 1st rule run
       Projection: a + b as "#0", a
   ```
   
   I'm not fully familiar with all the optimizer rules in DataFusion, so there 
might not be any "constant folding" one in: 
https://github.com/apache/datafusion/blob/main/datafusion/optimizer/src/optimizer.rs#L239-L272,
 but I hope you get the idea why the current change can cause issues.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to