peter-toth commented on code in PR #10868:
URL: https://github.com/apache/datafusion/pull/10868#discussion_r1636669317


##########
datafusion/optimizer/src/common_subexpr_eliminate.rs:
##########
@@ -166,6 +166,15 @@ impl CommonSubexprEliminate {
     ) -> Result<(Vec<Vec<Expr>>, LogicalPlan)> {
         let mut common_exprs = IndexMap::new();
 
+        input.schema().iter().for_each(|(qualifier, field)| {

Review Comment:
   I don't think we should suppose that all extracted aliases remain in the 
plan:
   - Let's suppose that we have a plan like this after some CSE optimization:
   ```
   ...
     Projection: #1 as c1, #1 as c2, #2 as c3, #2 as c4, (a + 2) as c5, (a + 1 
+ 1) as c6
       Projection: (a + b) as "#1", (a + c) as "#2", a
   ...
   ```
   - Then some other optimization rule prunes "c1" and "c2" from the plan 
because they turn out to be unnecessary:
   ```
   ...
     Projection: #2 as c3, #2 as c4, (a + 2) as c5, (a + 1 + 1) as c6
       Projection: (a + c) as "#2", a
   ...
   ```
   - And then some other rule creates new CSE possibilities:
   ```
   ...
     Projection: #2 as c3, #2 as c4, (a + 2) as c5, (a + 2) as c6
       Projection: (a + c) as "#2", a
   ...
   ```
   - CSE rule runs again but indexes here and at 
`build_common_expr_project_plan()` are out of sync...
   
   IMO the best thing we can do is to choose a unique aliases for a common 
expressions in `CommonSubexprRewriter` when we found the expression and store 
the alias in `common_exprs` together with the expression. In that case we don't 
need to deal with index sync issues and don't get plans with unnecessary 
aliases like here: 
https://github.com/apache/datafusion/pull/10868/files#diff-351499880963d6a383c92e156e75019cd9ce33107724a9635853d7d4cd1898d0R1403



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to