alamb commented on issue #9577:
URL: 
https://github.com/apache/arrow-datafusion/issues/9577#issuecomment-1997488393

   > I think, replacing Box<Expr> usages with Arc<Expr> under the enum Expr 
would improve performance. 
   
   I basically agree with this assessment but I have an alternate proposal for 
how to improve performance
   
   1. I agree that the current `clone()` of Exprs (and `Box::clone` does a deep 
copy)
   2. I believe that the `clone`ing of Exprs takes a significant amount of 
planning time in DataFusion (I was looking at `cargo bench --bench sql_planner` 
the other day and substantial amounts of time are spent cloning
   3. I think the "ideal" rust pattern for rewriting Exprs is what we have in 
our ExprRewriters -- take an owned `Expr` and then destructure / rewrite it as 
needed
   4. There are many places in our code where `Expr::clone()` is called 
Unecessirly
   5. If we changed Expr to use `Arc` instead of `Box` I think we would get a 
planning speedup, but it would be less performant than the ideal pattern (as 
rewritten nodes would likely rewrite copying, much like LogicalPlan), and it 
would be a large API change for downstream consumers
   
   Here is an example of what I think is a good pattern (there are no copies 
except when needed) 
   
   
https://github.com/apache/arrow-datafusion/blob/9d0c05b9c2e5f9e774ab1ea08599ac8d4b23c93f/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs#L638-L648
   
   Here is an example of where Expr cloning is being used unnecessarily
   
   
https://github.com/apache/arrow-datafusion/blob/9d0c05b9c2e5f9e774ab1ea08599ac8d4b23c93f/datafusion/optimizer/src/simplify_expressions/inlist_simplifier.rs#L45-L49
   
   Thus my suggestion is to go through the planner and remove the calls to 
`Expr::clone()` as much as possible (likely letting `cargo bench --bench 
sql_planner` be our guide. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to