[
https://issues.apache.org/jira/browse/SPARK-4895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zane Hu updated SPARK-4895:
---------------------------
Description:
It seems a valid requirement to allow jobs from different Spark contexts to
share RDDs. It would be limited if we only allow sharing RDDs within a
SparkContext, as in Ooyala (SPARK-818). A more generic way for collaboration
among jobs from different Spark contexts is to support a shared RDD store
managed by a RDD store master and workers running in separate processes from
SparkContext and executor JVMs. This shared RDD store doesn't do any RDD
transformations, but accepts requests from jobs of different Spark contexts to
read and write shared RDDs in memory or on disks on distributed machines, and
manages the life cycle of these RDDs.
Tachyon might be used for sharing data in this case. But I think Tachyon is
more designed as an in-memory distributed file system for any applications, not
only for RDDs and Spark.
If people agree, I may draft out a design document for further discussions.
was:
It seems a valid requirement to allow jobs from different Spark contexts to
share RDDs. It would be limited if we only allow sharing RDDs within a
SparkContext, as in Ooyala (SPARK-818). A more generic way for collaboration
among jobs from different Spark contexts is to support a shared RDD store
managed by a RDD store master and workers running in separate processes from
SparkContext and executor JVMs. This shared RDD store doesn't do any RDD
transformations, but accepts requests from jobs of different Spark contexts to
read and write shared RDDs in memory or on disks on distributed machines, and
manages the life cycle of these RDDs.
Tachyon might be used for sharing data. But I think Tachyon is more designed as
an in-memory distributed file system for any applications, not only for RDDs
and Spark.
If people agree, I may draft out a design document for further discussions.
> Support a shared RDD store among different Spark contexts
> ---------------------------------------------------------
>
> Key: SPARK-4895
> URL: https://issues.apache.org/jira/browse/SPARK-4895
> Project: Spark
> Issue Type: New Feature
> Reporter: Zane Hu
>
> It seems a valid requirement to allow jobs from different Spark contexts to
> share RDDs. It would be limited if we only allow sharing RDDs within a
> SparkContext, as in Ooyala (SPARK-818). A more generic way for collaboration
> among jobs from different Spark contexts is to support a shared RDD store
> managed by a RDD store master and workers running in separate processes from
> SparkContext and executor JVMs. This shared RDD store doesn't do any RDD
> transformations, but accepts requests from jobs of different Spark contexts
> to read and write shared RDDs in memory or on disks on distributed machines,
> and manages the life cycle of these RDDs.
> Tachyon might be used for sharing data in this case. But I think Tachyon is
> more designed as an in-memory distributed file system for any applications,
> not only for RDDs and Spark.
> If people agree, I may draft out a design document for further discussions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]