sadboy commented on PR #7775: URL: https://github.com/apache/arrow-datafusion/pull/7775#issuecomment-1856178556
Hi, just saw this thread. FWIW we (SDF) recently changed all our internal use of `LogicalPlan` to `Arc<LogicalPlan>` (for reference, this was the change we made on the Datafusion side: https://github.com/sdf-labs/arrow-datafusion/pull/40), and saw no performance impact whatsoever on our semantic analysis workloads. My guess is that `LogicalPlan::clone()` is already a O(1) operation (in the size of the logical plan, because of the Arc pointer to parent plans), so saving that constant factor was negligible in the grand scheme of things. What did turn out to have a *huge* perf impact on our workloads, was the asymptotic behavior of the logical plan constructors. Specifically, many methods in `LogicalPlanBuilder`, e.g. `project` and `join`, perform input sanitization which is (at least) O(n) in the size of the parent plan(s), and as a result using `LogicalPlanBuilder` to construct logical plans takes O(n^2) time in the size of the input query. Anyway, tl;dr is that 1. Choice of `LogicalPlan` vs `Arc<LogicalPlan>` has minimal perf impact, and thus can be made based solely on API ergonomic considerations 2. Logical plan constructors do have large impact on overall perf, and thus should be careful in the operations they perform -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
