Dandandan commented on issue #9846: URL: https://github.com/apache/arrow-datafusion/issues/9846#issuecomment-2034369728
> Is there a rule of thumb for choosing SMJ over HJ? I wonder how SMJ in DataFusion compares against HJ at the moment. Some ideas for when SMJ could be chosen over HJ: - When input data is already sorted on relevant keys, it is likely faster/requires less memory to plan a SMJ than HJ. - HJ might require more memory than SMJ, so whenever e.g. data skew is expected one might choose sort merge over hash join. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
