peter-toth commented on code in PR #10396: URL: https://github.com/apache/datafusion/pull/10396#discussion_r1592032033
########## datafusion/optimizer/src/common_subexpr_eliminate.rs: ########## @@ -720,43 +782,53 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> { return Ok(Transformed::new(expr, false, TreeNodeRecursion::Jump)); } - let curr_id = &ExprSet::expr_identifier(&expr); - - // lookup previously visited expression - match self.expr_set.get(curr_id) { - Some((_, counter, _, symbol)) => { - // if has a commonly used (a.k.a. 1+ use) expr - if *counter > 1 { - self.affected_id.insert(curr_id.clone()); - - let expr_name = expr.display_name()?; - // Alias this `Column` expr to it original "expr name", - // `projection_push_down` optimizer use "expr name" to eliminate useless - // projections. - Ok(Transformed::new( - col(symbol).alias(expr_name), - true, - TreeNodeRecursion::Jump, - )) - } else { - Ok(Transformed::no(expr)) - } + let (up_index, expr_id) = &self.id_array[self.down_index]; Review Comment: Reviewers might wonder why the code of `CommonSubexprRewriter` after the https://github.com/apache/datafusion/pull/9871 revert in the first commit doesn't resemble to the code before the PR: https://github.com/apache/datafusion/pull/9871/files#diff-351499880963d6a383c92e156e75019cd9ce33107724a9635853d7d4cd1898d0L754-L809? And why the 2 halting conditions (https://github.com/apache/datafusion/pull/9871/files#diff-351499880963d6a383c92e156e75019cd9ce33107724a9635853d7d4cd1898d0L757-L761) are missing from the new code. This is because it turned out that: - the `self.curr_index >= self.id_array.len()` bound check is poinless as we already indexed the array with `self.curr_index`: https://github.com/apache/datafusion/pull/9871/files#diff-351499880963d6a383c92e156e75019cd9ce33107724a9635853d7d4cd1898d0L754. (Note: `curr_index` is called `down_index` in the new PR.) - `self.max_series_number > *series_number` is also pointless (and actually there is no point in storing `max_series_number` in the rewriter at all) as `*series_number` (called `up_index` in the new code, that is basically the postorder index of the node) can never be smaller than the last replaced expression's postorder index (`max_series_number`). This is because in the rewriter we are doing a preorder traversal and a smaller postorder index could appear only in a replaced expression's subtree or in a subtree of a previous child of the node's ancestors. (BTW, I verified this with adding a `panic!` to the halting condition and running all tests.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org