[
https://issues.apache.org/jira/browse/SPARK-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14113206#comment-14113206
]
Marcelo Vanzin commented on SPARK-3215:
---------------------------------------
Matei, yes, all those things exist, but that is not what I'd like to discuss at
this point. What I'm trying to discuss is:
- Are these all the functional requirements needed to cover the use cases we
have at hand (at the moment, Hive on Spark)
- Is this something that should live in the core, alongside the core, or
somewhere else entirely
None of those depend on the choice of technology used to implement the feature,
and none of those is affected by the things you mention. Those are all
implementation details for when a consensus is reached over what to implement.
Once those two questions are sorted out, yes, then we can start to discuss
details of the API and how to implement it. But in my view it's too early to
get into those discussions.
And yes, as I said before, it's very possible for people to implement their own
version of this. The point I'm making here is that it would be nice to have
this readily available so people don't have to do that. Kinda like the Scala
standard library added Futures so people would stop implementing their own...
Regardless of the decision of where this code will live, it will have to exist,
because Hive-on-Spark depends on it. We just thought it would be beneficial to
Spark and its users to have it be generic enough to cover more than the
Hive-on-Spark use case, and live as a part of Spark itself.
> Add remote interface for SparkContext
> -------------------------------------
>
> Key: SPARK-3215
> URL: https://issues.apache.org/jira/browse/SPARK-3215
> Project: Spark
> Issue Type: New Feature
> Components: Spark Core
> Reporter: Marcelo Vanzin
> Labels: hive
> Attachments: RemoteSparkContext.pdf
>
>
> A quick description of the issue: as part of running Hive jobs on top of
> Spark, it's desirable to have a SparkContext that is running in the
> background and listening for job requests for a particular user session.
> Running multiple contexts in the same JVM is not a very good solution. Not
> only SparkContext currently has issues sharing the same JVM among multiple
> instances, but that turns the JVM running the contexts into a huge bottleneck
> in the system.
> So I'm proposing a solution where we have a SparkContext that is running in a
> separate process, and listening for requests from the client application via
> some RPC interface (most probably Akka).
> I'll attach a document shortly with the current proposal. Let's use this bug
> to discuss the proposal and any other suggestions.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]