peter-toth commented on code in PR #10396:
URL: https://github.com/apache/datafusion/pull/10396#discussion_r1592927865


##########
datafusion/optimizer/src/common_subexpr_eliminate.rs:
##########
@@ -720,43 +782,53 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> {
             return Ok(Transformed::new(expr, false, TreeNodeRecursion::Jump));
         }
 
-        let curr_id = &ExprSet::expr_identifier(&expr);
-
-        // lookup previously visited expression
-        match self.expr_set.get(curr_id) {
-            Some((_, counter, _, symbol)) => {
-                // if has a commonly used (a.k.a. 1+ use) expr
-                if *counter > 1 {
-                    self.affected_id.insert(curr_id.clone());
-
-                    let expr_name = expr.display_name()?;
-                    // Alias this `Column` expr to it original "expr name",
-                    // `projection_push_down` optimizer use "expr name" to 
eliminate useless
-                    // projections.
-                    Ok(Transformed::new(
-                        col(symbol).alias(expr_name),
-                        true,
-                        TreeNodeRecursion::Jump,
-                    ))
-                } else {
-                    Ok(Transformed::no(expr))
-                }
+        let (up_index, expr_id) = &self.id_array[self.down_index];
+        self.down_index += 1;
+
+        // skip `Expr`s without identifier (empty identifier).
+        if expr_id.is_empty() {
+            return Ok(Transformed::no(expr));
+        }
+
+        let (counter, _) = self.expr_stats.get(expr_id).unwrap();
+        if *counter > 1 {
+            // step index to skip all sub-node (which has smaller series 
number).
+            while self.down_index < self.id_array.len()
+                && self.id_array[self.down_index].0 < *up_index
+            {
+                self.down_index += 1;
             }
-            None => Ok(Transformed::no(expr)),
+
+            let expr_name = expr.display_name()?;
+            self.common_exprs.insert(expr_id.clone(), expr);
+            // Alias this `Column` expr to it original "expr name",
+            // `projection_push_down` optimizer use "expr name" to eliminate 
useless
+            // projections.
+            // TODO: do we really need to alias here?

Review Comment:
   Yes, I see that is needed but it looks weird to me. I mean if we consider 
the example:
   ```
   select a + b as x, a + b as y 
   from ...
   ```
   that should be eliminated to:
   ```
   select common as x, common as y
     select a + b as common
     from ...
   ```
   I.e. we need to add aliases to the extracted common expressions, but with 
this alias here we alias `common` and this is what the rule does:
   ```
   select common as "a + b" as x, common as "a + b" as y
     select a + b as common
     from ...
   ```
   I wanted to understand why exactly we need to alias both places.
   
   BTW I left a few other TODOs in the code but none of them means that code 
should be worse than it was before 
https://github.com/apache/datafusion/pull/9871. Maybe the longer identifiers 
due to the extra `{`, `}` and` |` can make it a bit slower. I left the TODOs 
there are points where we can improve the rule a bit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to