[
https://issues.apache.org/jira/browse/SPARK-18278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15745944#comment-15745944
]
Matt Cheah commented on SPARK-18278:
------------------------------------
[~rxin] - thanks for thinking about this!
The concerns around testing and support burden are certainly valid. The
alternatives also come with their own sets of concerns though.
If we publish the scheduler as a library:
The current code in the schedulers is not marked as public API. We would need
to refactor the scheduler code to make an API the Apache project would support
for third party use (like the K8s integration). There are (at least) two places
this needs to be done:
* CoarseGrainedSchedulerBackend would need to become extendable, since all of
the schedulers (standalone, Mesos, yarn-client, and yarn-cluster) currently
extend this fairly complex class. The CoarseGrainedSchedulerBackend code
invokes its pluggable methods (doRequestTotalExecutors and doKillExecutors)
with particular expectations, and hence these expectations would also have to
remain stable as long as it is a public API.
* SparkSubmit would need to support 3rd party cluster managers. Currently
SparkSubmit's code includes special case handling for the bundled cluster
managers, for example in YARN mode spark-submit accepts --queue to specify the
queue to run the job with. Thus we would need to make the spark-submit argument
handling pluggable as well for other cluster managers parameters.
Off the top of my head, I could think of numerous ways we could expose both of
these as plugins, but it's not immediately obvious what the best option is.
If we fork the project:
Maintaining a fork places burden on the fork maintainers to keep the fork up to
date with the mainline releases. It also makes it unclear what the relationship
between this feature and its associated fork is with the direction of the Spark
project as a whole, and what the timeline is for eventual re-integration of the
fork. Is there a prior example of this approach working in practice in the
Spark community?
In either case (library or forking), there's also the question of how we
encourage alpha testing and early usage of this feature. If the code is not on
the mainline branch, there needs to be alternative channels outside of the
Spark releases themselves to announce that this feature is available and that
we would like feedback on it. It would also be ideal for the code reviews to be
visible early on, so that everyone that watches the Spark repository can catch
the updates and progress of this feature.
Having said all of this, I think these issues can be navigated. If I had to
choose between maintaining a fork versus cleaning up the scheduler to make a
public API, I would choose the latter in the interest of clarifying the
relationship between the K8s effort and the mainline project, as well as for
making the scheduler code cleaner in general. However it's not immediately
clear if the effort required to make these refactors is worthwhile when we
could include the K8s scheduler in the Apache releases as an experimental
feature, ignore its bugs and test failures for the next few releases (that is,
problems in the K8s-related code should never block releases), and ship this as
we currently do with YARN and Mesos.
I'd like to hear everyone's thoughts regarding the tradeoffs we are making
between these different approaches of pushing this feature forward.
> Support native submission of spark jobs to a kubernetes cluster
> ---------------------------------------------------------------
>
> Key: SPARK-18278
> URL: https://issues.apache.org/jira/browse/SPARK-18278
> Project: Spark
> Issue Type: Umbrella
> Components: Build, Deploy, Documentation, Scheduler, Spark Core
> Reporter: Erik Erlandson
> Attachments: SPARK-18278 - Spark on Kubernetes Design Proposal.pdf
>
>
> A new Apache Spark sub-project that enables native support for submitting
> Spark applications to a kubernetes cluster. The submitted application runs
> in a driver executing on a kubernetes pod, and executors lifecycles are also
> managed as pods.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]