Hi community,

About a year ago I've started to work on the patch to Apache Livy for Spark
on Kubernetes support in the scope of the project I've been working on.
Since that time I've created a PR
https://github.com/apache/incubator-livy/pull/167 which have already been
discussed and reviewed a lot. After finalizing the work in the result of
the PR discussions I've started to split the work introduced in the base PR
into smaller pieces to make it easier to separate the core and aux
functionality, and as a result - easier to review and merge. The first core
PR is https://github.com/apache/incubator-livy/pull/249.

Also I've created the repos with Docker images (
https://github.com/jahstreet/spark-on-kubernetes-docker) and Helm charts (
https://github.com/jahstreet/spark-on-kubernetes-helm) with the possible
stack the users may want to use Livy on Kubernetes with, which potentially
in the future can be partially moved to Livy repo to keep the artifacts
required to run Livy on Kubernetes in a single place.

Until now I've received the positive feedback from more than 10 projects
about the usage of the patch. Several of them could be found in the
discussions of the base PR. Also my repos supporting this feature have
around 35 stars and 15 forks in total and were referenced in Spark related
Stackoverflow and Kubernetes slack channel discussions. So the users use it
already.

You may think "What this guy wants from us then!?"... Well, I would like to
ask for your time and expertise to help push it forward and ideally make it
merged.

Probably before I started coding I should have checked with the
contributors if this feature may have value for the project and how is
better to implement it, but I hope it is never too late;) So I'm here to
share with you the the thoughts behind it.

The idea of Livy on Kubernetes is simply to replicate the logic it has for
Yarn API to Kubernetes API, which can be easily done since the interfaces
for the Yarn API are really similar to the ones of the Kubernetes.
Nevertheless this easy-to-do patch opens Livy the doors to Kubernetes which
seems to be really useful for the community taking into account the
popularity of Kubernetes itself and the latest releases of Spark supporting
Kubernetes as well.

Proposed Livy job submission flow:
- Generate appTag and add
`spark.kubernetes.[driver/executor].label.SPARK_APP_TAG_LABEL` to Spark
config
- Run spark-submit in cluster-mode with Kubernetes master
- Start monitoring thread which resolves Spark Driver and Executor Pods
using the `SPARK_APP_TAG_LABEL`s assigned during the job submission
- Create additional Kubernetes resource if necessary: Spark UI service,
Ingress, CRDs, etc.
- Fetch Spark Pods statuses, Driver logs and other diagnostics information
while Spark Pods are running
- Remove Spark job resources (completed/failed Driver Pod, Service,
ConfigMap, etc.) from the cluster after the job completion/failure after
the configured timeout

The core functionality (covered by
https://github.com/apache/incubator-livy/pull/249):
- Submission of Batch jobs and Interactive sessions
- Caching Driver logs and Kubernetes Pods diagnostics

Aux features (introduced in
https://github.com/apache/incubator-livy/pull/167 and planned to be pushed
after the core PR is merged):
- Proxying Spark UI with Kubernetes Ingress'es and providing the access to
it with the link on the Livy UI while the job is running (Nginx ingress as
an example)
- Proxying Spark History Server with Kubernetes Ingress'es and providing
the access to it with the link on the Livy UI after the job
completion/failure (Nginx ingress as an example)

Misc:
- Add documentation to the Livy web-site and the guide on how to run Livy
on Kubernetes with the possible config options
- Add Docker manifests and Helm charts

Having Livy running on Kubernetes we do get not only the possibility to run
Spark on Kubernetes with the REST API but also get out of the box
integrations:
- Jupyter notebooks (with JupyterHub on Kubernetes) through Sparkmagic
kernel
- Extensive monitoring with Prometheus and Grafana stack

Some more ideas could be found in the readme here:
https://github.com/jahstreet/spark-on-kubernetes-helm.

I would be thankful to everyone paying some time to review the feature I
propose and appreciate any help or advices on the right steps towards
making the feature released.

Kind regards,
Aliaksandr Sasnouskikh
https://github.com/jahstreet/
https://www.linkedin.com/in/jahstreet/
+31 (0) 6 51 29 58 39

Reply via email to