mustafasrepo opened a new issue, #9576: URL: https://github.com/apache/arrow-datafusion/issues/9576
### Is your feature request related to a problem or challenge? _No response_ ### Describe the solution you'd like Currently, common `CommonSubexprEliminate` `LogicalPlan` optimizer rule analyzes common sub-expressions in a query. Then caches, common sub-expression by adding a `LogicalPlan::Projection` if it thinks this is beneficial. As an example, following query ```sql SELECT c3+c4, SUM(c3+c4) OVER(order by c3+c4) FROM t ``` generates following `LogicalPlan`: ``` Projection: t.c3 + t.c4, SUM(t.c3 + t.c4) ORDER BY [t.c3 + t.c4 ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW --WindowAggr: windowExpr=[[SUM(CAST(t.c3 + t.c4t.c4t.c3 AS t.c3 + t.c4 AS Int64)) ORDER BY [t.c3 + t.c4t.c4t.c3 AS t.c3 + t.c4 ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW AS SUM(t.c3 + t.c4) ORDER BY [t.c3 + t.c4 ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW]] ----Projection: t.c3 + t.c4 AS t.c3 + t.c4t.c4t.c3, t.c3, t.c4 ------TableScan: t projection=[c3, c4] ``` where `t.c3+t.c4` is calculated once in the `Projection` then referred by subsequent `WindowAggr` as a column. However, following query: ```sql SELECT c3+c4, SUM(c3+c4) OVER() FROM t ``` generates following `LogicalPlan`: ``` Projection: t.c3 + t.c4, SUM(t.c3 + t.c4) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING --WindowAggr: windowExpr=[[SUM(CAST(t.c3 + t.c4 AS Int64)) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING]] ----TableScan: t projection=[c3, c4] ``` instead we could generate following plan: ``` Projection: col(t.c3 + t.c4), SUM(t.c3 + t.c4) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING --WindowAggr: windowExpr=[[SUM(CAST(col(t.c3 + t.c4) AS t.c3 + t.c4 AS Int64)) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING]] ----Projection: t.c3 + t.c4 AS col(t.c3 + t.c4) ------TableScan: t projection=[c3, c4] ``` If were to keep track of common sub expression counts globally across different nodes in the `LogicalPlan`. This will enable us to generate better `LogicalPlan`s. ### Describe alternatives you've considered _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
