Janos Matyas created ZEPPELIN-3020:
--------------------------------------

             Summary: Add support to run Spark interpreter on a Kubernetes 
cluster
                 Key: ZEPPELIN-3020
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3020
             Project: Zeppelin
          Issue Type: New Feature
            Reporter: Janos Matyas


The goal of this PR is to be able to execute Spark notebooks on Kubernetes in 
cluster mode, so that the Spark Driver runs inside Kubernetes cluster - based 
on https://github.com/apache-spark-on-k8s/spark. Zeppelin uses `spark-submit` 
to start RemoteInterpreterServer which is able to execute notebooks on Spark. 
Kubernetes specific `spark-submit` parameters like driver, executor, init 
container, shuffle images should be set  in SPARK_SUBMIT_OPTIONS environment 
variable. In case the Spark interpreter is configured with a K8 Spark specific 
master url (k8s://https....) RemoteInterpreterServer is launched inside a Spark 
driver pod on Kubernetes, thus Zeppelin server it has to be able to connect to 
the remote server. In a Kubernetes cluster the best solution for this is 
creating a K8S service for RemoteInterpreterServer. This is the reason for 
having the SparkK8RemoteInterpreterManagerProcess - extending functionality of 
RemoteInterpreterManagerProcess - which creates the Kubernetes service, mapping 
the port of  RemoteInterpreterServer in Driver pod and connects to this service 
once Spark Driver pod is in Running state.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to