alamb opened a new issue, #9637: URL: https://github.com/apache/arrow-datafusion/issues/9637
### Is your feature request related to a problem or challenge? Broken out from https://github.com/apache/arrow-datafusion/issues/9577 where @mustafasrepo @comphead and @jayzhan211 and I were discussing optimizer performance TLDR is that the datafusion optimizer is slow. When I did some profiling locally by running the following ```rust cargo bench --bench sql_planner -- physical_plan_tpch_all ``` My analysis is that almost 40% of the planning time is spent in SimplifyExprs and CommonSubexprEliminate and most of that time is related to copying expressions from what I can tell  While those passes themselves internally make a bunch of clones, which we are improving (e.g. @jayzhan211 on https://github.com/apache/arrow-datafusion/pull/9628) I think there is a more fundamental structural problem I think a core challenge is that the [OptimizerRule](https://docs.rs/datafusion/latest/datafusion/optimizer/trait.OptimizerRule.html) trait pretty much *requires* copying `Exprs` on each pass, as it gets a `&LogicalPlan` input, but produces a `LogicalPlan` output ```rust // Required methods fn try_optimize( &self, plan: &LogicalPlan, config: &dyn OptimizerConfig ) -> Result<Option<LogicalPlan>, DataFusionError>; ``` This mean any pass that works on `Exprs` must `clone` all `Exprs` (by calling `LogicalPlan::expressions()`) rewrite them, and then then create a new `LogicalPlan` with those new Exprs. Here is that pattern in the expression simplifier: https://github.com/apache/arrow-datafusion/blob/0eec5f8e1d0f55e48f5cdc628fbb5ddd89b91512/datafusion/optimizer/src/simplify_expressions/simplify_exprs.rs#L112-L123 ### Describe the solution you'd like Find some way to avoid clone'ing exprs during LogicalPlan rewrite ### Describe alternatives you've considered _No response_ ### Additional context We have talked about various other ways to reduce copying of LogicalPlans as well as its challenges in other tickets: * https://github.com/apache/arrow-datafusion/issues/4628 * https://github.com/apache/arrow-datafusion/issues/5157 * https://github.com/apache/arrow-datafusion/issues/9577 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
