Matt Cheah created SPARK-19700:
----------------------------------
Summary: Design an API for pluggable scheduler implementations
Key: SPARK-19700
URL: https://issues.apache.org/jira/browse/SPARK-19700
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 2.1.0
Reporter: Matt Cheah
One point that was brought up in discussing SPARK-18278 was that schedulers
cannot easily be added to Spark without forking the whole project. The main
reason is that much of the scheduler's behavior fundamentally depends on the
CoarseGrainedSchedulerBackend class, which is not part of the public API of
Spark and is in fact quite a complex module. As resource management and
allocation continues evolves, Spark will need to be integrated with more
cluster managers, but maintaining support for all possible allocators in the
Spark project would be untenable. Furthermore, it would be impossible for Spark
to support proprietary frameworks that are developed by specific users for
their other particular use cases.
Therefore, this ticket proposes making scheduler implementations fully
pluggable. The idea is that Spark will provide a Java/Scala interface that is
to be implemented by a scheduler that is backed by the cluster manager of
interest. The user can compile their scheduler's code into a JAR that is placed
on the driver's classpath. Finally, as is the case in the current world, the
scheduler implementation is selected and dynamically loaded depending on the
user's provided master URL.
Determining the correct API is the most challenging problem. The current
CoarseGrainedSchedulerBackend handles many responsibilities, some of which will
be common across all cluster managers, and some which will be specific to a
particular cluster manager. For example, the particular mechanism for creating
the executor processes will differ between YARN and Mesos, but, once these
executors have started running, the means to submit tasks to them over the
Netty RPC is identical across the board.
We must also consider a plugin model and interface for submitting the
application as well, because different cluster managers support different
configuration options, and thus the driver must be bootstrapped accordingly.
For example, in YARN mode the application and Hadoop configuration must be
packaged and shipped to the distributed cache prior to launching the job. A
prototype of a Kubernetes implementation starts a Kubernetes pod that runs the
driver in cluster mode.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]