Dandandan commented on issue #342: URL: https://github.com/apache/arrow-ballista/issues/342#issuecomment-1276510961
That's a good observation @mingmwang ! The difference with CollectLeft is that that mode collects the left side to one partition, whereas with broadcast we would broadcast the output of the left side to each worker. Indeed, I think the trade off is that doing a bit more on the left side (i.e. building the hash table in each worker) we save the work on the right side (shuffle). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
