sadboy commented on issue #4628: URL: https://github.com/apache/arrow-datafusion/issues/4628#issuecomment-1865039407
Yes this is a very real problem. We see this kind of pattern in production warehouse queries fairly often. They're usually the result of some automated query composition, and can get quite big by themselves. Tacking on an exponential factor on top means the system will be completely unusable (i.e. upwards of an hour just to compile one query, without even invoking the optimizer). It's not just about the memory footprint -- if your datastructure itself is exponential then that's basically your lower bound for performance, as a simple operation like `clone()` would take exponential time. In general, exponential blow ups in production systems are deal breakers IMO, and removing them should not be considered premature optimization. All that is to say, if you plan to remove `Arc<LogicalPlan>` (which I'm neutral), then you'll have to replace it with some other mechanism for common subtree sharing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
