Hi community, About a year ago I've started to work on the patch to Apache Livy for Spark on Kubernetes support in the scope of the project I've been working on. Since that time I've created a PR https://github.com/apache/incubator-livy/pull/167 which have already been discussed and reviewed a lot. After finalizing the work in the result of the PR discussions I've started to split the work introduced in the base PR into smaller pieces to make it easier to separate the core and aux functionality, and as a result - easier to review and merge. The first core PR is https://github.com/apache/incubator-livy/pull/249.
Also I've created the repos with Docker images ( https://github.com/jahstreet/spark-on-kubernetes-docker) and Helm charts ( https://github.com/jahstreet/spark-on-kubernetes-helm) with the possible stack the users may want to use Livy on Kubernetes with, which potentially in the future can be partially moved to Livy repo to keep the artifacts required to run Livy on Kubernetes in a single place. Until now I've received the positive feedback from more than 10 projects about the usage of the patch. Several of them could be found in the discussions of the base PR. Also my repos supporting this feature have around 35 stars and 15 forks in total and were referenced in Spark related Stackoverflow and Kubernetes slack channel discussions. So the users use it already. You may think "What this guy wants from us then!?"... Well, I would like to ask for your time and expertise to help push it forward and ideally make it merged. Probably before I started coding I should have checked with the contributors if this feature may have value for the project and how is better to implement it, but I hope it is never too late;) So I'm here to share with you the the thoughts behind it. The idea of Livy on Kubernetes is simply to replicate the logic it has for Yarn API to Kubernetes API, which can be easily done since the interfaces for the Yarn API are really similar to the ones of the Kubernetes. Nevertheless this easy-to-do patch opens Livy the doors to Kubernetes which seems to be really useful for the community taking into account the popularity of Kubernetes itself and the latest releases of Spark supporting Kubernetes as well. Proposed Livy job submission flow: - Generate appTag and add `spark.kubernetes.[driver/executor].label.SPARK_APP_TAG_LABEL` to Spark config - Run spark-submit in cluster-mode with Kubernetes master - Start monitoring thread which resolves Spark Driver and Executor Pods using the `SPARK_APP_TAG_LABEL`s assigned during the job submission - Create additional Kubernetes resource if necessary: Spark UI service, Ingress, CRDs, etc. - Fetch Spark Pods statuses, Driver logs and other diagnostics information while Spark Pods are running - Remove Spark job resources (completed/failed Driver Pod, Service, ConfigMap, etc.) from the cluster after the job completion/failure after the configured timeout The core functionality (covered by https://github.com/apache/incubator-livy/pull/249): - Submission of Batch jobs and Interactive sessions - Caching Driver logs and Kubernetes Pods diagnostics Aux features (introduced in https://github.com/apache/incubator-livy/pull/167 and planned to be pushed after the core PR is merged): - Proxying Spark UI with Kubernetes Ingress'es and providing the access to it with the link on the Livy UI while the job is running (Nginx ingress as an example) - Proxying Spark History Server with Kubernetes Ingress'es and providing the access to it with the link on the Livy UI after the job completion/failure (Nginx ingress as an example) Misc: - Add documentation to the Livy web-site and the guide on how to run Livy on Kubernetes with the possible config options - Add Docker manifests and Helm charts Having Livy running on Kubernetes we do get not only the possibility to run Spark on Kubernetes with the REST API but also get out of the box integrations: - Jupyter notebooks (with JupyterHub on Kubernetes) through Sparkmagic kernel - Extensive monitoring with Prometheus and Grafana stack Some more ideas could be found in the readme here: https://github.com/jahstreet/spark-on-kubernetes-helm. I would be thankful to everyone paying some time to review the feature I propose and appreciate any help or advices on the right steps towards making the feature released. Kind regards, Aliaksandr Sasnouskikh https://github.com/jahstreet/ https://www.linkedin.com/in/jahstreet/ +31 (0) 6 51 29 58 39