Re: [I] Remove Arc from LogicalPlan, stop copying LogicalPlans [arrow-datafusion]

via GitHub Wed, 20 Dec 2023 11:44:44 -0800


sadboy commented on issue #4628:
URL: 
https://github.com/apache/arrow-datafusion/issues/4628#issuecomment-1865039407


   Yes this is a very real problem. We see this kind of pattern in production 
warehouse queries fairly often. They're usually the result of some automated 
query composition, and can get quite big by themselves. Tacking on an 
exponential factor on top means the system will be completely unusable (i.e. 
upwards of an hour just to compile one query, without even invoking the 
optimizer).
   
   It's not just about the memory footprint -- if your datastructure itself is 
exponential then that's basically your lower bound for performance, as a simple 
operation like `clone()` would take exponential time. In general, exponential 
blow ups in production systems are deal breakers IMO, and removing them should 
not be considered premature optimization.
   
   All that is to say, if you plan to remove `Arc<LogicalPlan>` (which I'm 
neutral), then you'll have to replace it with some other mechanism for common 
subtree sharing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Remove Arc from LogicalPlan, stop copying LogicalPlans [arrow-datafusion]

Reply via email to