[ 
https://issues.apache.org/jira/browse/SPARK-4895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zane Hu updated SPARK-4895:
---------------------------
    Description: 
It seems a valid requirement to allow jobs from different Spark contexts to 
share RDDs. It would be limited if we only allow sharing RDDs within a 
SparkContext, as in Ooyala (SPARK-818). A more generic way for collaboration 
among jobs from different Spark contexts is to support a shared RDD store 
managed by a RDD store master and workers running in separate processes from 
SparkContext and executor JVMs. This shared RDD store doesn't do any RDD 
transformations, but accepts requests from jobs of different Spark contexts to 
read and write shared RDDs in memory or on disks on distributed machines, and 
manages the life cycle of these RDDs.

Tachyon might be used for sharing data in this case. But I think Tachyon is 
more designed as an in-memory distributed file system for any applications, not 
only for RDDs and Spark.

If people agree, I may draft out a design document for further discussions.


  was:
It seems a valid requirement to allow jobs from different Spark contexts to 
share RDDs. It would be limited if we only allow sharing RDDs within a 
SparkContext, as in Ooyala (SPARK-818). A more generic way for collaboration 
among jobs from different Spark contexts is to support a shared RDD store 
managed by a RDD store master and workers running in separate processes from 
SparkContext and executor JVMs. This shared RDD store doesn't do any RDD 
transformations, but accepts requests from jobs of different Spark contexts to 
read and write shared RDDs in memory or on disks on distributed machines, and 
manages the life cycle of these RDDs.

Tachyon might be used for sharing data. But I think Tachyon is more designed as 
an in-memory distributed file system for any applications, not only for RDDs 
and Spark.

If people agree, I may draft out a design document for further discussions.



> Support a shared RDD store among different Spark contexts
> ---------------------------------------------------------
>
>                 Key: SPARK-4895
>                 URL: https://issues.apache.org/jira/browse/SPARK-4895
>             Project: Spark
>          Issue Type: New Feature
>            Reporter: Zane Hu
>
> It seems a valid requirement to allow jobs from different Spark contexts to 
> share RDDs. It would be limited if we only allow sharing RDDs within a 
> SparkContext, as in Ooyala (SPARK-818). A more generic way for collaboration 
> among jobs from different Spark contexts is to support a shared RDD store 
> managed by a RDD store master and workers running in separate processes from 
> SparkContext and executor JVMs. This shared RDD store doesn't do any RDD 
> transformations, but accepts requests from jobs of different Spark contexts 
> to read and write shared RDDs in memory or on disks on distributed machines, 
> and manages the life cycle of these RDDs.
> Tachyon might be used for sharing data in this case. But I think Tachyon is 
> more designed as an in-memory distributed file system for any applications, 
> not only for RDDs and Spark.
> If people agree, I may draft out a design document for further discussions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to