Xuefu Zhang created SPARK-4290:
----------------------------------
Summary: Provide an equivalent functionality of distributed cache
as MR does
Key: SPARK-4290
URL: https://issues.apache.org/jira/browse/SPARK-4290
Project: Spark
Issue Type: Improvement
Components: Spark Core
Reporter: Xuefu Zhang
MapReduce allows client to specify files to be put in distributed cache for a
job and the framework guarentees that the file will be available in local file
system of a node where a task of the job runs and before the tasks actually
starts. While this might be achieved with Yarn via hacks, it's not available in
other clusters. It would be nice to have such an equivalent functionality like
this in Spark.
It would also complement Spark's broadcast variable, which may not be suitable
in certain scenarios.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]