[
https://issues.apache.org/jira/browse/SPARK-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14620296#comment-14620296
]
Sean Owen commented on SPARK-5842:
----------------------------------
Generally speaking it's the driver that has enough information to perform the
broadcast, not the executors. This is why you can't make RDDs from executors.
Making executors into mini drivers would be complex and not really worth it.
> Allow creating broadcast variables on workers
> ---------------------------------------------
>
> Key: SPARK-5842
> URL: https://issues.apache.org/jira/browse/SPARK-5842
> Project: Spark
> Issue Type: New Feature
> Components: MLlib, Spark Core
> Reporter: Xiangrui Meng
>
> Now broadcast variables must be created by the driver. Many algorithms in
> MLlib uses the driver to collect gradient and broadcast the new weights,
> which makes driver a bottleneck. It would be nice if we can create broadcast
> variables on workers and return their handlers to the driver. An ML iteration
> will look like the following after this change:
> (training data + broadcasted weights) ->reduceByKey -> single partition RDD
> with aggregated gradient -> update weights and broadcast it -> driver
> receives the broadcast variable
> where the driver is only doing the scheduling work.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]