[ 
https://issues.apache.org/jira/browse/SPARK-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14112769#comment-14112769
 ] 

Reynold Xin edited comment on SPARK-3215 at 8/27/14 8:27 PM:
-------------------------------------------------------------

I looked at the document. The high level proposal looks good. Can you update 
the document to include more details? In particular, a few things that are 
important to define are:

1. Interface for "Future"
2. Full interface for RemoteClient, including how to initialize it for 
different cluster mgr backends, how to add application jars
3. The RPC protocol between client/server: transport protocol, frameworks to 
use, version compatibility, etc
4. Project organization: should this module be in core, or should we create a 
new module in Spark (e.g. SparkContextClient module?)


I think one benefit of this effort is we can remove the yarn-client mode in 
which we run the SparkContext off yarn in order to support the repl. Even 
though it is not part of the scope, with this change, we can just launch a repl 
and connect that to a remote yarn cluster where SparkContext is in a yarn AM.


was (Author: rxin):
I looked at the document. The high level proposal looks good. Can you update 
the document to include more details? In particular, a few things that are 
important to define are:

1. Interface for "Future"
2. Full interface for RemoteClient, including how to initialize it for 
different cluster mgr backends, how to add application jars
3. The RPC protocol between client/server: transport protocol, frameworks to 
use, version compatibility, etc
4. Project organization: should this module be in core, or should we create a 
new module in Spark (e.g. SparkContextClient module?)


> Add remote interface for SparkContext
> -------------------------------------
>
>                 Key: SPARK-3215
>                 URL: https://issues.apache.org/jira/browse/SPARK-3215
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>            Reporter: Marcelo Vanzin
>              Labels: hive
>         Attachments: RemoteSparkContext.pdf
>
>
> A quick description of the issue: as part of running Hive jobs on top of 
> Spark, it's desirable to have a SparkContext that is running in the 
> background and listening for job requests for a particular user session.
> Running multiple contexts in the same JVM is not a very good solution. Not 
> only SparkContext currently has issues sharing the same JVM among multiple 
> instances, but that turns the JVM running the contexts into a huge bottleneck 
> in the system.
> So I'm proposing a solution where we have a SparkContext that is running in a 
> separate process, and listening for requests from the client application via 
> some RPC interface (most probably Akka).
> I'll attach a document shortly with the current proposal. Let's use this bug 
> to discuss the proposal and any other suggestions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to