peter-toth commented on code in PR #10868: URL: https://github.com/apache/datafusion/pull/10868#discussion_r1636669317
########## datafusion/optimizer/src/common_subexpr_eliminate.rs: ########## @@ -166,6 +166,15 @@ impl CommonSubexprEliminate { ) -> Result<(Vec<Vec<Expr>>, LogicalPlan)> { let mut common_exprs = IndexMap::new(); + input.schema().iter().for_each(|(qualifier, field)| { Review Comment: I don't think we should suppose that all extracted aliases remain in the plan: - Let's suppose that we have a plan like this after some CSE optimization: ``` ... Projection: #1 as c1, #1 as c2, #2 as c3, #2 as c4, (a + 2) as c5, (a + 1 + 1) as c6 Projection: (a + b) as "#1", (a + c) as "#2", a ... ``` - Then some other optimization rule prunes "c1" and "c2" from the plan because they turn out to be unnecessary: ``` ... Projection: #2 as c3, #2 as c4, (a + 2) as c5, (a + 1 + 1) as c6 Projection: (a + c) as "#2", a ... ``` - And then some other rule creates new CSE possibilities: ``` ... Projection: #2 as c3, #2 as c4, (a + 2) as c5, (a + 2) as c6 Projection: (a + c) as "#2", a ... ``` - CSE rule runs again but indexes here and at `build_common_expr_project_plan()` are out of sync... IMO the best thing we can do is to choose a unique aliases for a common expressions in `CommonSubexprRewriter` when we found the expression and store the alias in `common_exprs` together with the expression. In that case we don't need to deal with index sync issues and don't get plans with unnecessary aliases like here: https://github.com/apache/datafusion/pull/10868/files#diff-351499880963d6a383c92e156e75019cd9ce33107724a9635853d7d4cd1898d0R1403 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org