[ 
https://issues.apache.org/jira/browse/SPARK-17556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16788206#comment-16788206
 ] 

Eyal Farago commented on SPARK-17556:
-------------------------------------

why was this abandoned?

[~viirya]'s pull request seems promising.

I think the last comment by [~LI,Xiao] applies for current implementation as 
well as executors hold the entire broadcast anyway (assuming they ran task that 
used it) - so memory footprint on the executors side doesn't change, re. 
performance regression in case of multiple smaller partitions this also applies 
for current implementation as the RDD partitions has to be calculated and 
transferred to the driver.

one thing I personally think could be improved in [~viirya]'s PR was the 
requirement for the RDD to be pre-persisted, I think blocks could be evaluated 
in the mapPartition operation performed in the newly introduced RDD.broadcast 
method, this would have solved most comments by [~holdenk_amp] in the PR.

> Executor side broadcast for broadcast joins
> -------------------------------------------
>
>                 Key: SPARK-17556
>                 URL: https://issues.apache.org/jira/browse/SPARK-17556
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core, SQL
>            Reporter: Reynold Xin
>            Priority: Major
>         Attachments: executor broadcast.pdf, executor-side-broadcast.pdf
>
>
> Currently in Spark SQL, in order to perform a broadcast join, the driver must 
> collect the result of an RDD and then broadcast it. This introduces some 
> extra latency. It might be possible to broadcast directly from executors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to