Dandandan commented on issue #342:
URL: https://github.com/apache/arrow-ballista/issues/342#issuecomment-1276510961

   That's a good observation @mingmwang !
   The difference with CollectLeft is that that mode collects the left side to 
one partition, whereas with broadcast we would broadcast the output of the left 
side to each worker.
   
   Indeed, I think the trade off is that doing a bit more on the left side 
(i.e. building the hash table in each worker) we save the work on the right 
side (shuffle).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to