[
https://issues.apache.org/jira/browse/SPARK-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201621#comment-14201621
]
Xuefu Zhang edited comment on SPARK-4290 at 11/7/14 5:37 AM:
-------------------------------------------------------------
Yes, SparkContext#addFile() seems to be what we need. If the files can be more
efficiently broadcast to every executor, that's even better than distributed
cache. In the meantime, we can set a large replication factor for the files to
mitigate the problem.
To clarify, [~sandyr], [~rxin], do files added via SparkContext.addFile() get
automatically downloaded to executor, or SparkFiles.get() has to be called in
order to make that happen?
was (Author: xuefuz):
Yes, SparkContext#addFile() seems to be what we need. If the files can be more
efficiently broadcast to every executor, that's even better than distributed
cache. In the meantime, we can set a large replication factor for the files to
mitigate the problem.
> Provide an equivalent functionality of distributed cache as MR does
> -------------------------------------------------------------------
>
> Key: SPARK-4290
> URL: https://issues.apache.org/jira/browse/SPARK-4290
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Reporter: Xuefu Zhang
>
> MapReduce allows client to specify files to be put in distributed cache for a
> job and the framework guarentees that the file will be available in local
> file system of a node where a task of the job runs and before the tasks
> actually starts. While this might be achieved with Yarn via hacks, it's not
> available in other clusters. It would be nice to have such an equivalent
> functionality like this in Spark.
> It would also complement Spark's broadcast variable, which may not be
> suitable in certain scenarios.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]