peter-toth commented on code in PR #10396: URL: https://github.com/apache/datafusion/pull/10396#discussion_r1592927865
########## datafusion/optimizer/src/common_subexpr_eliminate.rs: ########## @@ -720,43 +782,53 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> { return Ok(Transformed::new(expr, false, TreeNodeRecursion::Jump)); } - let curr_id = &ExprSet::expr_identifier(&expr); - - // lookup previously visited expression - match self.expr_set.get(curr_id) { - Some((_, counter, _, symbol)) => { - // if has a commonly used (a.k.a. 1+ use) expr - if *counter > 1 { - self.affected_id.insert(curr_id.clone()); - - let expr_name = expr.display_name()?; - // Alias this `Column` expr to it original "expr name", - // `projection_push_down` optimizer use "expr name" to eliminate useless - // projections. - Ok(Transformed::new( - col(symbol).alias(expr_name), - true, - TreeNodeRecursion::Jump, - )) - } else { - Ok(Transformed::no(expr)) - } + let (up_index, expr_id) = &self.id_array[self.down_index]; + self.down_index += 1; + + // skip `Expr`s without identifier (empty identifier). + if expr_id.is_empty() { + return Ok(Transformed::no(expr)); + } + + let (counter, _) = self.expr_stats.get(expr_id).unwrap(); + if *counter > 1 { + // step index to skip all sub-node (which has smaller series number). + while self.down_index < self.id_array.len() + && self.id_array[self.down_index].0 < *up_index + { + self.down_index += 1; } - None => Ok(Transformed::no(expr)), + + let expr_name = expr.display_name()?; + self.common_exprs.insert(expr_id.clone(), expr); + // Alias this `Column` expr to it original "expr name", + // `projection_push_down` optimizer use "expr name" to eliminate useless + // projections. + // TODO: do we really need to alias here? Review Comment: Yes, I see that is needed but it looks weird to me. I mean if we consider the example: ``` select a + b as x, a + b as y from ... ``` that should be eliminated to: ``` select common as x, common as y select a + b as common from ... ``` I.e. we need to add aliases to the extracted common expressions, but with this alias here we alias `common` and this is what the rule does: ``` select common as "a + b" as x, common as "a + b" as y select a + b as common from ... ``` I wanted to understand why exactly we need to alias both places. BTW I left a few other TODOs in the code but none of them means that code should be worse than it was before https://github.com/apache/datafusion/pull/9871. Maybe the longer identifiers due to the extra `{`, `}` and` |` can make it a bit slower. I left the TODOs there are points where we can improve the rule a bit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org