[jira] [Resolved] (SPARK-3621) Provide a way to broadcast an RDD (instead of just a variable made of the RDD) so that a job can access

Sean Owen (JIRA) Sat, 24 Jan 2015 04:48:08 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sean Owen resolved SPARK-3621.
------------------------------
    Resolution: Not a Problem

Given the discussion, this is best solved by reading data directly at the 
workers, rather than involving the driver, or it is already solvable by 
broadcasting values collected on the driver. It won't be possible to broadcast 
an RDD, in any event.

> Provide a way to broadcast an RDD (instead of just a variable made of the 
> RDD) so that a job can access
> -------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-3621
>                 URL: https://issues.apache.org/jira/browse/SPARK-3621
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.0.0, 1.1.0
>            Reporter: Xuefu Zhang
>
> In some cases, such as Hive's way of doing map-side join, it would be 
> benefcial to allow client program to broadcast RDDs rather than just 
> variables made of these RDDs. Broadcasting a variable made of RDDs requires 
> all RDD data be collected to the driver and that the variable be shipped to 
> the cluster after being made. It would be more performing if driver just 
> broadcasts the RDDs and uses the corresponding data in jobs (such building 
> hashmaps at executors).
> Tez has a broadcast edge which can ship data from previous stage to the next 
> stage, which doesn't require driver side processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (SPARK-3621) Provide a way to broadcast an RDD (instead of just a variable made of the RDD) so that a job can access

Reply via email to