Zane Hu created SPARK-4895:
------------------------------
Summary: Support a shared RDD store among different Spark contexts
Key: SPARK-4895
URL: https://issues.apache.org/jira/browse/SPARK-4895
Project: Spark
Issue Type: New Feature
Reporter: Zane Hu
It seems a valid requirement to allow jobs from different Spark contexts to
share RDDs. It would be limited if we only allow sharing RDDs within a
SparkContext, as in Ooyala (SPARK-818). A more generic way for collaboration
among jobs from different Spark contexts is to support a shared RDD store
managed by a RDD store master and workers running in separate processes from
SparkContext and executor JVMs. This shared RDD store doesn't do any RDD
transformations, but accepts requests from jobs of different Spark contexts to
read and write shared RDDs in memory or on disks on distributed machines, and
manages the life cycle of these RDDs.
Tachyon might be used for sharing data. But I think Tachyon is more designed as
an in-memory distributed file system for any applications, not only for RDDs
and Spark.
If people agree, I may draft out a design document for further discussions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]