Dandandan commented on issue #9846:
URL: 
https://github.com/apache/arrow-datafusion/issues/9846#issuecomment-2034369728

   > Is there a rule of thumb for choosing SMJ over HJ?
   
   I wonder how SMJ in DataFusion compares against HJ at the moment. 
   
   Some ideas for when SMJ could be chosen over HJ:
   
   - When input data is already sorted on relevant keys, it is likely 
faster/requires less memory to plan a SMJ than HJ.
   - HJ might require more memory than SMJ, so whenever e.g. data skew is 
expected one might choose sort merge over hash join.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to