[jira] [Commented] (SPARK-18278) Support native submission of spark jobs to a kubernetes cluster

Matt Cheah (JIRA) Tue, 13 Dec 2016 10:59:43 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-18278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15745944#comment-15745944
 ]


Matt Cheah commented on SPARK-18278:
------------------------------------

[~rxin] - thanks for thinking about this!

The concerns around testing and support burden are certainly valid. The 
alternatives also come with their own sets of concerns though.

If we publish the scheduler as a library:

The current code in the schedulers is not marked as public API. We would need 
to refactor the scheduler code to make an API the Apache project would support 
for third party use (like the K8s integration). There are (at least) two places 
this needs to be done:

* CoarseGrainedSchedulerBackend would need to become extendable, since all of 
the schedulers (standalone, Mesos, yarn-client, and yarn-cluster) currently 
extend this fairly complex class. The CoarseGrainedSchedulerBackend code 
invokes its pluggable methods (doRequestTotalExecutors and doKillExecutors) 
with particular expectations, and hence these expectations would also have to 
remain stable as long as it is a public API.
* SparkSubmit would need to support 3rd party cluster managers. Currently 
SparkSubmit's code includes special case handling for the bundled cluster 
managers, for example in YARN mode spark-submit accepts --queue to specify the 
queue to run the job with. Thus we would need to make the spark-submit argument 
handling pluggable as well for other cluster managers parameters.

Off the top of my head, I could think of numerous ways we could expose both of 
these as plugins, but it's not immediately obvious what the best option is.

If we fork the project:

Maintaining a fork places burden on the fork maintainers to keep the fork up to 
date with the mainline releases. It also makes it unclear what the relationship 
between this feature and its associated fork is with the direction of the Spark 
project as a whole, and what the timeline is for eventual re-integration of the 
fork.  Is there a prior example of this approach working in practice in the 
Spark community?


In either case (library or forking), there's also the question of how we 
encourage alpha testing and early usage of this feature. If the code is not on 
the mainline branch, there needs to be alternative channels outside of the 
Spark releases themselves to announce that this feature is available and that 
we would like feedback on it. It would also be ideal for the code reviews to be 
visible early on, so that everyone that watches the Spark repository can catch 
the updates and progress of this feature.

Having said all of this, I think these issues can be navigated. If I had to 
choose between maintaining a fork versus cleaning up the scheduler to make a 
public API, I would choose the latter in the interest of clarifying the 
relationship between the K8s effort and the mainline project, as well as for 
making the scheduler code cleaner in general. However it's not immediately 
clear if the effort required to make these refactors is worthwhile when we 
could include the K8s scheduler in the Apache releases as an experimental 
feature, ignore its bugs and test failures for the next few releases (that is, 
problems in the K8s-related code should never block releases), and ship this as 
we currently do with YARN and Mesos.

I'd like to hear everyone's thoughts regarding the tradeoffs we are making 
between these different approaches of pushing this feature forward.


> Support native submission of spark jobs to a kubernetes cluster
> ---------------------------------------------------------------
>
>                 Key: SPARK-18278
>                 URL: https://issues.apache.org/jira/browse/SPARK-18278
>             Project: Spark
>          Issue Type: Umbrella
>          Components: Build, Deploy, Documentation, Scheduler, Spark Core
>            Reporter: Erik Erlandson
>         Attachments: SPARK-18278 - Spark on Kubernetes Design Proposal.pdf
>
>
> A new Apache Spark sub-project that enables native support for submitting 
> Spark applications to a kubernetes cluster.   The submitted application runs 
> in a driver executing on a kubernetes pod, and executors lifecycles are also 
> managed as pods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-18278) Support native submission of spark jobs to a kubernetes cluster

Reply via email to