MohamedAbdeen21 commented on code in PR #10396:
URL: https://github.com/apache/datafusion/pull/10396#discussion_r1592944032


##########
datafusion/optimizer/src/common_subexpr_eliminate.rs:
##########
@@ -720,43 +782,53 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> {
             return Ok(Transformed::new(expr, false, TreeNodeRecursion::Jump));
         }
 
-        let curr_id = &ExprSet::expr_identifier(&expr);
-
-        // lookup previously visited expression
-        match self.expr_set.get(curr_id) {
-            Some((_, counter, _, symbol)) => {
-                // if has a commonly used (a.k.a. 1+ use) expr
-                if *counter > 1 {
-                    self.affected_id.insert(curr_id.clone());
-
-                    let expr_name = expr.display_name()?;
-                    // Alias this `Column` expr to it original "expr name",
-                    // `projection_push_down` optimizer use "expr name" to 
eliminate useless
-                    // projections.
-                    Ok(Transformed::new(
-                        col(symbol).alias(expr_name),
-                        true,
-                        TreeNodeRecursion::Jump,
-                    ))
-                } else {
-                    Ok(Transformed::no(expr))
-                }
+        let (up_index, expr_id) = &self.id_array[self.down_index];
+        self.down_index += 1;
+
+        // skip `Expr`s without identifier (empty identifier).
+        if expr_id.is_empty() {
+            return Ok(Transformed::no(expr));
+        }
+
+        let (counter, _) = self.expr_stats.get(expr_id).unwrap();
+        if *counter > 1 {
+            // step index to skip all sub-node (which has smaller series 
number).
+            while self.down_index < self.id_array.len()
+                && self.id_array[self.down_index].0 < *up_index
+            {
+                self.down_index += 1;
             }
-            None => Ok(Transformed::no(expr)),
+
+            let expr_name = expr.display_name()?;
+            self.common_exprs.insert(expr_id.clone(), expr);
+            // Alias this `Column` expr to it original "expr name",
+            // `projection_push_down` optimizer use "expr name" to eliminate 
useless
+            // projections.
+            // TODO: do we really need to alias here?

Review Comment:
   I have a stashed PR that addresses this issue.
   
   I'm not 100% sure about why the second alias is needed, but there was a 
comment saying that the second alias is used by another rule, so removing it 
inside CSE seems to break other rules (or CSE itself sometimes (??) idk, I'm 
still looking into it).
   
   Anyway, the solution I have at the moment is calling `.unalias()` on the 
expr when applying a new alias through `expr.alias()`. That way, when the true 
alias `x` is later (I assume after all the rules finished) given to the expr 
`common as a + b`, it will become `common as x`.
   
   It passes the tests, but I really want to understand why it works before 
pushing the PR + I hope maybe I can find a better way by removing the second 
alias added by CSE all together, even if it means changing other rules.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to