[GitHub] [arrow-ballista] Dandandan opened a new issue, #342: Support broadcast (join/etc.)

GitBox Tue, 11 Oct 2022 10:05:59 -0700


Dandandan opened a new issue, #342:
URL: https://github.com/apache/arrow-ballista/issues/342


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   Broadcasting partitions helps for when 
   
   **Describe the solution you'd like**
   We should support broadcasts in the physical plan.
   
   Broadcasting means copying the entire dataset to each worker.
   
   This could be used in broadcast joins, i.e. by broadcasting smaller 
dataframes to every worker, which can provide big speedups as the other (big) 
side doesn't have to be shuffled.
   
   **Describe alternatives you've considered**
   
   **Additional context**
   
   Probably we can reuse some heuristics from Spark for conditions when to 
perform broadcasting for joins.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-ballista] Dandandan opened a new issue, #342: Support broadcast (join/etc.)

Reply via email to