[ 
https://issues.apache.org/jira/browse/SPARK-17556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15529476#comment-15529476
 ] 

Liang-Chi Hsieh commented on SPARK-17556:
-----------------------------------------

For example, assume we have 3 executors and the broadcasting object is split to 
3 pieces too. You think executor side broadcast will end to 3 executors 
connecting to 3 executors. Let us see.

t1:

E1  E2  E3
p1  p2  p3

t2:

E1 going to fetch p2 from E2, E2 going to fetch p3 from E3, E3 going to fetch 
p2 from E2

E1        E2         E3
p1, p2    p2, p3     p2, p3

t3:

E1 going to fetch p3 from E2, E2 going to fetch p1 from E1, E3 going to fetch 
p1 from E1

E1            E2             E3
p1, p2, p3    p1, p2, p3     p1, p2, p3


Now all executors get all pieces of data. During the broadcast, E1 connected to 
E2, E2 connected to E1, E3, E3 connected to E1, E2. In above, E1 doesn't 
connect to E3.

The simple analysis is based on the assumption that the operations is 
synchronize. But it already shows that the BitTorrent approach can relieve the 
all-to-all transferring problem.





> Executor side broadcast for broadcast joins
> -------------------------------------------
>
>                 Key: SPARK-17556
>                 URL: https://issues.apache.org/jira/browse/SPARK-17556
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core, SQL
>            Reporter: Reynold Xin
>         Attachments: executor broadcast.pdf, executor-side-broadcast.pdf
>
>
> Currently in Spark SQL, in order to perform a broadcast join, the driver must 
> collect the result of an RDD and then broadcast it. This introduces some 
> extra latency. It might be possible to broadcast directly from executors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to