mustafasrepo opened a new issue, #9576:
URL: https://github.com/apache/arrow-datafusion/issues/9576

   ### Is your feature request related to a problem or challenge?
   
   _No response_
   
   ### Describe the solution you'd like
   
   Currently, common `CommonSubexprEliminate` `LogicalPlan` optimizer rule 
analyzes common sub-expressions in a query. Then caches, common sub-expression 
by adding a `LogicalPlan::Projection` if it thinks this is beneficial.
   As an example, following query
   ```sql
   SELECT c3+c4, SUM(c3+c4) OVER(order by c3+c4)
   FROM t
   ```
   generates following `LogicalPlan`:
   ```
   Projection: t.c3 + t.c4, SUM(t.c3 + t.c4) ORDER BY [t.c3 + t.c4 ASC NULLS 
LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
   --WindowAggr: windowExpr=[[SUM(CAST(t.c3 + t.c4t.c4t.c3 AS t.c3 + t.c4 AS 
Int64)) ORDER BY [t.c3 + t.c4t.c4t.c3 AS t.c3 + t.c4 ASC NULLS LAST] RANGE 
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW AS SUM(t.c3 + t.c4) ORDER BY [t.c3 
+ t.c4 ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW]]
   ----Projection: t.c3 + t.c4 AS t.c3 + t.c4t.c4t.c3, t.c3, t.c4
   ------TableScan: t projection=[c3, c4]
   ```
   where `t.c3+t.c4` is calculated once in the `Projection` then referred by 
subsequent `WindowAggr` as a column.
   
   However, following query:
   ```sql
   SELECT c3+c4, SUM(c3+c4) OVER()
   FROM t
   ```
   generates following `LogicalPlan`:
   ```
   Projection: t.c3 + t.c4, SUM(t.c3 + t.c4) ROWS BETWEEN UNBOUNDED PRECEDING 
AND UNBOUNDED FOLLOWING
   --WindowAggr: windowExpr=[[SUM(CAST(t.c3 + t.c4 AS Int64)) ROWS BETWEEN 
UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING]]
   ----TableScan: t projection=[c3, c4]
   ```
   instead we could generate following plan:
   ```
   Projection: col(t.c3 + t.c4), SUM(t.c3 + t.c4) ROWS BETWEEN UNBOUNDED 
PRECEDING AND UNBOUNDED FOLLOWING
   --WindowAggr: windowExpr=[[SUM(CAST(col(t.c3 + t.c4) AS t.c3 + t.c4 AS 
Int64)) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING]]
   ----Projection: t.c3 + t.c4 AS col(t.c3 + t.c4)
   ------TableScan: t projection=[c3, c4]
   ```
   If were to keep track of common sub expression counts globally across 
different nodes in the `LogicalPlan`. This will enable us to generate better 
`LogicalPlan`s.
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to