Re: [PR] Fix and improve `CommonSubexprEliminate` rule [datafusion]

via GitHub Tue, 07 May 2024 01:56:22 -0700


peter-toth commented on code in PR #10396:
URL: https://github.com/apache/datafusion/pull/10396#discussion_r1592032033



##########
datafusion/optimizer/src/common_subexpr_eliminate.rs:
##########
@@ -720,43 +782,53 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> {
             return Ok(Transformed::new(expr, false, TreeNodeRecursion::Jump));
         }
 
-        let curr_id = &ExprSet::expr_identifier(&expr);
-
-        // lookup previously visited expression
-        match self.expr_set.get(curr_id) {
-            Some((_, counter, _, symbol)) => {
-                // if has a commonly used (a.k.a. 1+ use) expr
-                if *counter > 1 {
-                    self.affected_id.insert(curr_id.clone());
-
-                    let expr_name = expr.display_name()?;
-                    // Alias this `Column` expr to it original "expr name",
-                    // `projection_push_down` optimizer use "expr name" to 
eliminate useless
-                    // projections.
-                    Ok(Transformed::new(
-                        col(symbol).alias(expr_name),
-                        true,
-                        TreeNodeRecursion::Jump,
-                    ))
-                } else {
-                    Ok(Transformed::no(expr))
-                }
+        let (up_index, expr_id) = &self.id_array[self.down_index];

Review Comment:
   Reviewers might wonder why the code of `CommonSubexprRewriter` after the 
https://github.com/apache/datafusion/pull/9871 revert doesn't resemble to the 
code before that PR: 
https://github.com/apache/datafusion/pull/9871/files#diff-351499880963d6a383c92e156e75019cd9ce33107724a9635853d7d4cd1898d0L754-L809?
   And why the 2 halting conditions 
(https://github.com/apache/datafusion/pull/9871/files#diff-351499880963d6a383c92e156e75019cd9ce33107724a9635853d7d4cd1898d0L757-L761)
 are missing from the new code. This is because it turned out that:
   - the `self.curr_index >= self.id_array.len()` bound check is poinless as we 
already indexed the array with `self.curr_index`: 
https://github.com/apache/datafusion/pull/9871/files#diff-351499880963d6a383c92e156e75019cd9ce33107724a9635853d7d4cd1898d0L754.
     (Note: `curr_index` is called `down_index` in the new PR.)
   - `self.max_series_number > *series_number` is also pointless (and actually 
there is no point in storing `max_series_number` in the rewriter at all) as 
`*series_number` (called `up_index` in the new code, that is basically the 
postorder index of the node) can never be smaller than the last replaced 
expression's postorder index (`max_series_number`). This is because in the 
rewriter we are doing a preorder traversal and a smaller postorder index could 
appear only in a replaced expression's subtree or in a subtree of a previous 
child of the node's ancestors.
     (BTW, I verified this with adding a `panic!` to the halting condition and 
running all tests.)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Fix and improve `CommonSubexprEliminate` rule [datafusion]

Reply via email to