alamb commented on PR #3787:
URL: 
https://github.com/apache/arrow-datafusion/pull/3787#issuecomment-1274633639

   I think keeping the model as simple as possible is likely the most robust 
and lead to the least surprising plans. This PR seems to have a reasonably 
straightforward model so 👍 
   
   In general, there are many known limitations to a cost based optimizer 
(because cost models are always estimates and thus can "get it wrong" 
sometimes, and typically it is hard to predict / debug when this happens 
(because there are correlations or skew in the data, for example)). 
   
   I would personally love to see DataFusion head more towards the "dynamically 
optimize at runtime" approach for joins (like have the join operators 
themselves scan the inputs to see if they can determine which is smaller, etc).
   
   That being said, the reason that CBO is so prevalent is that it is 
relatively simple to implement and well understood, so I don't have any 
objections to pursuing a more sophisticated cost model for DataFusion
   
   Thank you for this @isidentical 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to