spark git commit: [SPARK-22648][K8S] Add documentation covering init containers and secrets

ueshin Wed, 27 Dec 2017 20:53:27 -0800

Repository: spark
Updated Branches:
  refs/heads/master 171f6ddad -> ded6d27e4



[SPARK-22648][K8S] Add documentation covering init containers and secrets

## What changes were proposed in this pull request?

This PR updates the Kubernetes documentation corresponding to the following 
features/changes in #19954.
* Ability to use remote dependencies through the init-container.
* Ability to mount user-specified secrets into the driver and executor pods.

vanzin jiangxb1987 foxish

Author: Yinan Li <[email protected]>

Closes #20059 from liyinan926/doc-update.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ded6d27e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ded6d27e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ded6d27e

Branch: refs/heads/master
Commit: ded6d27e4eb02e4530015a95794e6ed0586faaa7
Parents: 171f6dd
Author: Yinan Li <[email protected]>
Authored: Thu Dec 28 13:53:04 2017 +0900
Committer: Takuya UESHIN <[email protected]>
Committed: Thu Dec 28 13:53:04 2017 +0900

----------------------------------------------------------------------
 docs/running-on-kubernetes.md    | 194 ++++++++++++++++++++++++----------
 sbin/build-push-docker-images.sh |   3 +-
 2 files changed, 143 insertions(+), 54 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/ded6d27e/docs/running-on-kubernetes.md
----------------------------------------------------------------------
diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md
index 0048bd9..e491329 100644
--- a/docs/running-on-kubernetes.md
+++ b/docs/running-on-kubernetes.md
@@ -69,17 +69,17 @@ building using the supplied script, or manually.
 
 To launch Spark Pi in cluster mode,
 
-{% highlight bash %}
+```bash
 $ bin/spark-submit \
     --master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
     --deploy-mode cluster \
     --name spark-pi \
     --class org.apache.spark.examples.SparkPi \
     --conf spark.executor.instances=5 \
-    --conf spark.kubernetes.driver.docker.image=<driver-image> \
-    --conf spark.kubernetes.executor.docker.image=<executor-image> \
+    --conf spark.kubernetes.driver.container.image=<driver-image> \
+    --conf spark.kubernetes.executor.container.image=<executor-image> \
     local:///path/to/examples.jar
-{% endhighlight %}
+```
 
 The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
 `spark.master` in the application's configuration, must be a URL with the 
format `k8s://<api_server_url>`. Prefixing the
@@ -120,6 +120,54 @@ by their appropriate remote URIs. Also, application 
dependencies can be pre-moun
 Those dependencies can be added to the classpath by referencing them with 
`local://` URIs and/or setting the
 `SPARK_EXTRA_CLASSPATH` environment variable in your Dockerfiles.
 
+### Using Remote Dependencies
+When there are application dependencies hosted in remote locations like HDFS 
or HTTP servers, the driver and executor pods
+need a Kubernetes 
[init-container](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/)
 for downloading
+the dependencies so the driver and executor containers can use them locally. 
This requires users to specify the container
+image for the init-container using the configuration property 
`spark.kubernetes.initContainer.image`. For example, users
+simply add the following option to the `spark-submit` command to specify the 
init-container image:
+
+```
+--conf spark.kubernetes.initContainer.image=<init-container image>
+```
+
+The init-container handles remote dependencies specified in `spark.jars` (or 
the `--jars` option of `spark-submit`) and
+`spark.files` (or the `--files` option of `spark-submit`). It also handles 
remotely hosted main application resources, e.g.,
+the main application jar. The following shows an example of using remote 
dependencies with the `spark-submit` command:
+
+```bash
+$ bin/spark-submit \
+    --master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
+    --deploy-mode cluster \
+    --name spark-pi \
+    --class org.apache.spark.examples.SparkPi \
+    --jars https://path/to/dependency1.jar,https://path/to/dependency2.jar
+    --files hdfs://host:port/path/to/file1,hdfs://host:port/path/to/file2
+    --conf spark.executor.instances=5 \
+    --conf spark.kubernetes.driver.container.image=<driver-image> \
+    --conf spark.kubernetes.executor.container.image=<executor-image> \
+    --conf spark.kubernetes.initContainer.image=<init-container image>
+    https://path/to/examples.jar
+```
+
+## Secret Management
+Kubernetes 
[Secrets](https://kubernetes.io/docs/concepts/configuration/secret/) can be 
used to provide credentials for a
+Spark application to access secured services. To mount a user-specified secret 
into the driver container, users can use
+the configuration property of the form 
`spark.kubernetes.driver.secrets.[SecretName]=<mount path>`. Similarly, the
+configuration property of the form 
`spark.kubernetes.executor.secrets.[SecretName]=<mount path>` can be used to 
mount a
+user-specified secret into the executor containers. Note that it is assumed 
that the secret to be mounted is in the same
+namespace as that of the driver and executor pods. For example, to mount a 
secret named `spark-secret` onto the path
+`/etc/secrets` in both the driver and executor containers, add the following 
options to the `spark-submit` command:
+
+```
+--conf spark.kubernetes.driver.secrets.spark-secret=/etc/secrets
+--conf spark.kubernetes.executor.secrets.spark-secret=/etc/secrets
+```
+
+Note that if an init-container is used, any secret mounted into the driver 
container will also be mounted into the
+init-container of the driver. Similarly, any secret mounted into an executor 
container will also be mounted into the
+init-container of the executor.
+
 ## Introspection and Debugging
 
 These are the different ways in which you can investigate a running/completed 
Spark application, monitor progress, and
@@ -275,7 +323,7 @@ specific to Spark on Kubernetes.
   <td><code>(none)</code></td>
   <td>
     Container image to use for the driver.
-    This is usually of the form `example.com/repo/spark-driver:v1.0.0`.
+    This is usually of the form 
<code>example.com/repo/spark-driver:v1.0.0</code>.
     This configuration is required and must be provided by the user.
   </td>
 </tr>
@@ -284,7 +332,7 @@ specific to Spark on Kubernetes.
   <td><code>(none)</code></td>
   <td>
     Container image to use for the executors.
-    This is usually of the form `example.com/repo/spark-executor:v1.0.0`.
+    This is usually of the form 
<code>example.com/repo/spark-executor:v1.0.0</code>.
     This configuration is required and must be provided by the user.
   </td>
 </tr>
@@ -528,51 +576,91 @@ specific to Spark on Kubernetes.
   </td>
 </tr>
 <tr>
-   <td><code>spark.kubernetes.driver.limit.cores</code></td>
-   <td>(none)</td>
-   <td>
-     Specify the hard CPU 
[limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container)
 for the driver pod.
-   </td>
- </tr>
- <tr>
-   <td><code>spark.kubernetes.executor.limit.cores</code></td>
-   <td>(none)</td>
-   <td>
-     Specify the hard CPU 
[limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container)
 for each executor pod launched for the Spark Application.
-   </td>
- </tr>
- <tr>
-   <td><code>spark.kubernetes.node.selector.[labelKey]</code></td>
-   <td>(none)</td>
-   <td>
-     Adds to the node selector of the driver pod and executor pods, with key 
<code>labelKey</code> and the value as the
-     configuration's value. For example, setting 
<code>spark.kubernetes.node.selector.identifier</code> to 
<code>myIdentifier</code>
-     will result in the driver pod and executors having a node selector with 
key <code>identifier</code> and value
-      <code>myIdentifier</code>. Multiple node selector keys can be added by 
setting multiple configurations with this prefix.
-    </td>
-  </tr>
- <tr>
-   <td><code>spark.kubernetes.driverEnv.[EnvironmentVariableName]</code></td>
-   <td>(none)</td>
-   <td>
-     Add the environment variable specified by 
<code>EnvironmentVariableName</code> to
-     the Driver process. The user can specify multiple of these to set 
multiple environment variables.
-   </td>
- </tr>
-  <tr>
-    <td><code>spark.kubernetes.mountDependencies.jarsDownloadDir</code></td>
-    <td><code>/var/spark-data/spark-jars</code></td>
-    <td>
-      Location to download jars to in the driver and executors.
-      This directory must be empty and will be mounted as an empty directory 
volume on the driver and executor pods.
-    </td>
-  </tr>
-   <tr>
-     <td><code>spark.kubernetes.mountDependencies.filesDownloadDir</code></td>
-     <td><code>/var/spark-data/spark-files</code></td>
-     <td>
-       Location to download jars to in the driver and executors.
-       This directory must be empty and will be mounted as an empty directory 
volume on the driver and executor pods.
-     </td>
-   </tr>
+  <td><code>spark.kubernetes.driver.limit.cores</code></td>
+  <td>(none)</td>
+  <td>
+    Specify the hard CPU 
[limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container)
 for the driver pod.
+  </td>
+</tr>
+<tr>
+  <td><code>spark.kubernetes.executor.limit.cores</code></td>
+  <td>(none)</td>
+  <td>
+    Specify the hard CPU 
[limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container)
 for each executor pod launched for the Spark Application.
+  </td>
+</tr>
+<tr>
+  <td><code>spark.kubernetes.node.selector.[labelKey]</code></td>
+  <td>(none)</td>
+  <td>
+    Adds to the node selector of the driver pod and executor pods, with key 
<code>labelKey</code> and the value as the
+    configuration's value. For example, setting 
<code>spark.kubernetes.node.selector.identifier</code> to 
<code>myIdentifier</code>
+    will result in the driver pod and executors having a node selector with 
key <code>identifier</code> and value
+     <code>myIdentifier</code>. Multiple node selector keys can be added by 
setting multiple configurations with this prefix.
+  </td>
+</tr>
+<tr>
+  <td><code>spark.kubernetes.driverEnv.[EnvironmentVariableName]</code></td>
+  <td>(none)</td>
+  <td>
+    Add the environment variable specified by 
<code>EnvironmentVariableName</code> to
+    the Driver process. The user can specify multiple of these to set multiple 
environment variables.
+  </td>
+</tr>
+<tr>
+  <td><code>spark.kubernetes.mountDependencies.jarsDownloadDir</code></td>
+  <td><code>/var/spark-data/spark-jars</code></td>
+  <td>
+    Location to download jars to in the driver and executors.
+    This directory must be empty and will be mounted as an empty directory 
volume on the driver and executor pods.
+  </td>
+</tr>
+<tr>
+  <td><code>spark.kubernetes.mountDependencies.filesDownloadDir</code></td>
+  <td><code>/var/spark-data/spark-files</code></td>
+  <td>
+    Location to download jars to in the driver and executors.
+    This directory must be empty and will be mounted as an empty directory 
volume on the driver and executor pods.
+  </td>
+</tr>
+<tr>
+  <td><code>spark.kubernetes.mountDependencies.timeout</code></td>
+  <td>300s</td>
+  <td>
+   Timeout in seconds before aborting the attempt to download and unpack 
dependencies from remote locations into
+   the driver and executor pods.
+  </td>
+</tr>
+<tr>
+  
<td><code>spark.kubernetes.mountDependencies.maxSimultaneousDownloads</code></td>
+  <td>5</td>
+  <td>
+   Maximum number of remote dependencies to download simultaneously in a 
driver or executor pod.
+  </td>
+</tr>
+<tr>
+  <td><code>spark.kubernetes.initContainer.image</code></td>
+  <td>(none)</td>
+  <td>
+   Container image for the <a 
href="https://kubernetes.io/docs/concepts/workloads/pods/init-containers/";>init-container</a>
 of the driver and executors for downloading dependencies. This is usually of 
the form <code>example.com/repo/spark-init:v1.0.0</code>. This configuration is 
optional and must be provided by the user if any non-container local dependency 
is used and must be downloaded remotely.
+  </td>
+</tr>
+<tr>
+  <td><code>spark.kubernetes.driver.secrets.[SecretName]</code></td>
+  <td>(none)</td>
+  <td>
+   Add the <a 
href="https://kubernetes.io/docs/concepts/configuration/secret/";>Kubernetes 
Secret</a> named <code>SecretName</code> to the driver pod on the path 
specified in the value. For example,
+   <code>spark.kubernetes.driver.secrets.spark-secret=/etc/secrets</code>. 
Note that if an init-container is used,
+   the secret will also be added to the init-container in the driver pod.
+  </td>
+</tr>
+<tr>
+  <td><code>spark.kubernetes.executor.secrets.[SecretName]</code></td>
+  <td>(none)</td>
+  <td>
+   Add the <a 
href="https://kubernetes.io/docs/concepts/configuration/secret/";>Kubernetes 
Secret</a> named <code>SecretName</code> to the executor pod on the path 
specified in the value. For example,
+   <code>spark.kubernetes.executor.secrets.spark-secret=/etc/secrets</code>. 
Note that if an init-container is used,
+   the secret will also be added to the init-container in the executor pod.
+  </td>
+</tr>
 </table>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/spark/blob/ded6d27e/sbin/build-push-docker-images.sh
----------------------------------------------------------------------
diff --git a/sbin/build-push-docker-images.sh b/sbin/build-push-docker-images.sh
index 4546e98..b313759 100755
--- a/sbin/build-push-docker-images.sh
+++ b/sbin/build-push-docker-images.sh
@@ -20,7 +20,8 @@
 # with Kubernetes support.
 
 declare -A path=( [spark-driver]=kubernetes/dockerfiles/driver/Dockerfile \
-                  [spark-executor]=kubernetes/dockerfiles/executor/Dockerfile )
+                  [spark-executor]=kubernetes/dockerfiles/executor/Dockerfile \
+                  
[spark-init]=kubernetes/dockerfiles/init-container/Dockerfile )
 
 function build {
   docker build -t spark-base -f kubernetes/dockerfiles/spark-base/Dockerfile .


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

spark git commit: [SPARK-22648][K8S] Add documentation covering init containers and secrets

Reply via email to