peter-toth commented on code in PR #10396:
URL: https://github.com/apache/datafusion/pull/10396#discussion_r1592032033
##########
datafusion/optimizer/src/common_subexpr_eliminate.rs:
##########
@@ -720,43 +782,53 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> {
return Ok(Transformed::new(expr, false, TreeNodeRecursion::Jump));
}
- let curr_id = &ExprSet::expr_identifier(&expr);
-
- // lookup previously visited expression
- match self.expr_set.get(curr_id) {
- Some((_, counter, _, symbol)) => {
- // if has a commonly used (a.k.a. 1+ use) expr
- if *counter > 1 {
- self.affected_id.insert(curr_id.clone());
-
- let expr_name = expr.display_name()?;
- // Alias this `Column` expr to it original "expr name",
- // `projection_push_down` optimizer use "expr name" to
eliminate useless
- // projections.
- Ok(Transformed::new(
- col(symbol).alias(expr_name),
- true,
- TreeNodeRecursion::Jump,
- ))
- } else {
- Ok(Transformed::no(expr))
- }
+ let (up_index, expr_id) = &self.id_array[self.down_index];
Review Comment:
Reviewers might wonder why the code of `CommonSubexprRewriter` after the
https://github.com/apache/datafusion/pull/9871 revert in the first commit
doesn't resemble to the code before the PR:
https://github.com/apache/datafusion/pull/9871/files#diff-351499880963d6a383c92e156e75019cd9ce33107724a9635853d7d4cd1898d0L754-L809?
And why the 2 halting conditions
(https://github.com/apache/datafusion/pull/9871/files#diff-351499880963d6a383c92e156e75019cd9ce33107724a9635853d7d4cd1898d0L757-L761)
are missing from the new code. This is because it turned out that:
- the `self.curr_index >= self.id_array.len()` bound check is poinless as we
already indexed the array with `self.curr_index`:
https://github.com/apache/datafusion/pull/9871/files#diff-351499880963d6a383c92e156e75019cd9ce33107724a9635853d7d4cd1898d0L754.
(Note: `curr_index` is called `down_index` in the new PR.)
- `self.max_series_number > *series_number` is also pointless (and actually
there is no point in storing `max_series_number` in the rewriter at all) as
`*series_number` (called `up_index` in the new code, that is basically the
postorder index of the node) can never be smaller than the last replaced
expression's postorder index (`max_series_number`). This is because in the
rewriter we are doing a preorder traversal and a smaller postorder index could
appear only in a replaced expression's subtree or in a subtree of a previous
child of the node's ancestors.
(BTW, I verified this with adding a `panic!` to the halting condition and
running all tests.)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]