alamb commented on PR #3787: URL: https://github.com/apache/arrow-datafusion/pull/3787#issuecomment-1274633639
I think keeping the model as simple as possible is likely the most robust and lead to the least surprising plans. This PR seems to have a reasonably straightforward model so 👍 In general, there are many known limitations to a cost based optimizer (because cost models are always estimates and thus can "get it wrong" sometimes, and typically it is hard to predict / debug when this happens (because there are correlations or skew in the data, for example)). I would personally love to see DataFusion head more towards the "dynamically optimize at runtime" approach for joins (like have the join operators themselves scan the inputs to see if they can determine which is smaller, etc). That being said, the reason that CBO is so prevalent is that it is relatively simple to implement and well understood, so I don't have any objections to pursuing a more sophisticated cost model for DataFusion Thank you for this @isidentical -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
