peter-toth opened a new pull request, #11357: URL: https://github.com/apache/datafusion/pull/11357
## Which issue does this PR close? Part of https://github.com/apache/datafusion/issues/11194. ## Rationale for this change Currently `CommonSubexprEliminate` doesn't recurse into short-circuit expression and misses extracing common expressions from surely evaluated legs. I.e. from the plan: ``` Projection (a + 1 OR b) AS c1, (a + 1 AND b) AS c2 ``` the expression `a + 1` is not extracted despite the fact that it is evaluated 2 times. Also, it would make sense to extract such expressions that are surely evalueted only once but there is a chance that they are evaluated conditionally as well. I.e. from the plan: ``` Projection (a + 1 OR b) AS c1, (a AND a + 1) AS c2 ``` it would make sense to extract `a + 1`. ## What changes are included in this PR? This PR: - Extends `ExprStats` with conditional evaluation counts. - Enhances `ExprIdentifierVisitor` to recurse into children of short-circuit expressions and maintain the state of beeing in a conditional expression branch with a new `conditional` flag in the visitor. - Treats expressions as common if they are surely evaluated at least 2 times or evaluated only once but also evaluated conditionally. - Fixes a bug in `OptimizeProjections` rule as it currently merges consecutive projections when there are multiple references to a certain column but they occur in 1 project expression. I.e. this rule currently merges projections: ``` Projection: (__common_expr_1 OR random() = Float64(0)) AND (__common_expr_1 OR random() = Float64(1)) AS c1 Projection: t1.a = Float64(1) AS __common_expr_1 | TableScan: t1 projection=[a] | ``` despite `t1.a = Float64(1)` is used 2 times. Without this bugfix in `OptimizeProjections` the effect of `CommonSubexprEliminate` would be reverted in the optimizer. - Adds new `Expr:column_refs_counts()` and `Expr::add_column_ref_counts()` APIs. ## Are these changes tested? Yes, added new UTs. ## Are there any user-facing changes? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org