peter-toth opened a new pull request, #11357:
URL: https://github.com/apache/datafusion/pull/11357

   ## Which issue does this PR close?
   
   Part of https://github.com/apache/datafusion/issues/11194.
   
   ## Rationale for this change
   
   Currently `CommonSubexprEliminate` doesn't recurse into short-circuit 
expression and misses extracing common expressions from surely evaluated legs. 
I.e. from the plan:
   ```
   Projection (a + 1 OR b) AS c1, (a + 1 AND b) AS c2
   ```
   the expression `a + 1` is not extracted despite the fact that it is 
evaluated 2 times.
   
   Also, it would make sense to extract such expressions that are surely 
evalueted only once but there is a chance that they are evaluated conditionally 
as well. I.e. from the plan:
   ```
   Projection (a + 1 OR b) AS c1, (a AND a + 1) AS c2
   ```
   it would make sense to extract `a + 1`.
   
   ## What changes are included in this PR?
   
   This PR:
   - Extends `ExprStats` with conditional evaluation counts.
   - Enhances `ExprIdentifierVisitor` to recurse into children of short-circuit 
expressions and maintain the state of beeing in a conditional expression branch 
with a new `conditional` flag in the visitor.
   - Treats expressions as common if they are surely evaluated at least 2 times 
or evaluated only once but also evaluated conditionally.
   - Fixes a bug in `OptimizeProjections` rule as it currently merges 
consecutive projections when there are multiple references to a certain column 
but they occur in 1 project expression. I.e. this rule currently merges 
projections:
     ```
     Projection: (__common_expr_1 OR random() = Float64(0)) AND 
(__common_expr_1 OR random() = Float64(1)) AS c1                                
                                  
       Projection: t1.a = Float64(1) AS __common_expr_1                         
                                                                                
                 |
         TableScan: t1 projection=[a]                                           
                                                                                
                 |
     ```
     despite `t1.a = Float64(1)` is used 2 times.
     Without this bugfix in `OptimizeProjections` the effect of 
`CommonSubexprEliminate` would be reverted in the optimizer.
   - Adds new `Expr:column_refs_counts()` and `Expr::add_column_ref_counts()` 
APIs.
   
   ## Are these changes tested?
   
   Yes, added new UTs.
   
   ## Are there any user-facing changes?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to