[ 
https://issues.apache.org/jira/browse/SPARK-17556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15518790#comment-15518790
 ] 

Liang-Chi Hsieh commented on SPARK-17556:
-----------------------------------------

For 1). It is true only if your driver is outside of the cluster. So you can 
avoid uploading data from the driver to the cluster. If it is in cluster mode, 
then I think it is no obvious difference between uploading data from the driver 
and any executor.

For 2). I think it is not exactly correct. Basically we perform a 
BitTorrent-like approach to fetch block, the slaves do need to connect to all 
others by the end.

> Executor side broadcast for broadcast joins
> -------------------------------------------
>
>                 Key: SPARK-17556
>                 URL: https://issues.apache.org/jira/browse/SPARK-17556
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core, SQL
>            Reporter: Reynold Xin
>         Attachments: executor broadcast.pdf, executor-side-broadcast.pdf
>
>
> Currently in Spark SQL, in order to perform a broadcast join, the driver must 
> collect the result of an RDD and then broadcast it. This introduces some 
> extra latency. It might be possible to broadcast directly from executors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to