Re: [PR] reduce clones of LogicalPlan in planner [arrow-datafusion]

via GitHub Thu, 14 Dec 2023 08:40:33 -0800


sadboy commented on PR #7775:
URL: 
https://github.com/apache/arrow-datafusion/pull/7775#issuecomment-1856178556


   Hi, just saw this thread. FWIW we (SDF) recently changed all our internal 
use of `LogicalPlan` to `Arc<LogicalPlan>` (for reference, this was the change 
we made on the Datafusion side: 
https://github.com/sdf-labs/arrow-datafusion/pull/40), and saw no performance 
impact whatsoever on our semantic analysis workloads. My guess is that 
`LogicalPlan::clone()` is already a O(1) operation (in the size of the logical 
plan, because of the Arc pointer to parent plans), so saving that constant 
factor was negligible in the grand scheme of things.
   
   What did turn out to have a *huge* perf impact on our workloads, was the 
asymptotic behavior of the logical plan constructors. Specifically, many 
methods in `LogicalPlanBuilder`, e.g. `project` and `join`, perform input 
sanitization which is (at least) O(n) in the size of the parent plan(s), and as 
a result using `LogicalPlanBuilder` to construct logical plans takes O(n^2) 
time in the size of the input query.
   
   Anyway, tl;dr is that
   1. Choice of `LogicalPlan` vs `Arc<LogicalPlan>` has minimal perf impact, 
and thus can be made based solely on API ergonomic considerations
   2. Logical plan constructors do have large impact on overall perf, and thus 
should be careful in the operations they perform
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] reduce clones of LogicalPlan in planner [arrow-datafusion]

Reply via email to