alamb opened a new issue, #9637:
URL: https://github.com/apache/arrow-datafusion/issues/9637

   ### Is your feature request related to a problem or challenge?
   
   Broken out from https://github.com/apache/arrow-datafusion/issues/9577 where 
@mustafasrepo  @comphead  and @jayzhan211  and I were discussing optimizer 
performance
   
   TLDR is that the datafusion optimizer is slow. When I did some profiling 
locally by running the following
   
   ```rust
   cargo bench --bench sql_planner -- physical_plan_tpch_all
   ```
   
   My analysis is that almost 40% of the planning time is spent in 
SimplifyExprs and CommonSubexprEliminate and most of that time is related to 
copying expressions from what I can tell
   
   
   ![Screenshot 2024-03-14 at 11 07 57 
AM](https://github.com/apache/arrow-datafusion/assets/490673/1c8214cc-09ec-41b2-aa5d-f52a5dfa4226)
   
   While those passes themselves internally make a bunch of clones,  which we 
are improving (e.g. @jayzhan211  on 
https://github.com/apache/arrow-datafusion/pull/9628)  I think there is a more 
fundamental structural problem
   
   I think a core challenge is that the  
[OptimizerRule](https://docs.rs/datafusion/latest/datafusion/optimizer/trait.OptimizerRule.html)
 trait pretty much *requires* copying `Exprs` on each pass, as it gets a 
`&LogicalPlan` input, but produces a `LogicalPlan` output
   
   ```rust
       // Required methods
       fn try_optimize(
           &self,
           plan: &LogicalPlan,
           config: &dyn OptimizerConfig
       ) -> Result<Option<LogicalPlan>, DataFusionError>;
   ```
   
   This mean any pass that works on `Exprs` must `clone` all `Exprs` (by 
calling `LogicalPlan::expressions()`) rewrite them, and then then create a new 
`LogicalPlan` with those new Exprs. 
   
   Here is that pattern in the expression simplifier:
   
   
https://github.com/apache/arrow-datafusion/blob/0eec5f8e1d0f55e48f5cdc628fbb5ddd89b91512/datafusion/optimizer/src/simplify_expressions/simplify_exprs.rs#L112-L123
   
   
   ### Describe the solution you'd like
   
   Find some way to avoid clone'ing exprs during LogicalPlan rewrite
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   We have talked about various other ways to reduce copying of LogicalPlans as 
well as its challenges in other tickets:
   * https://github.com/apache/arrow-datafusion/issues/4628
   * https://github.com/apache/arrow-datafusion/issues/5157
   * https://github.com/apache/arrow-datafusion/issues/9577


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to