This is an automated email from the ASF dual-hosted git repository.
yuchaoran pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/yunikorn-site.git
The following commit(s) were added to refs/heads/master by this push:
new 65ce6c5bb2 [YUNIKORN-2886] update Spark operator documentation for
YuniKorn integration (#474)
65ce6c5bb2 is described below
commit 65ce6c5bb26441134917a69a9f5cab9085057025
Author: Hsien-Cheng(Ryan) Huang <[email protected]>
AuthorDate: Mon Oct 14 09:59:58 2024 +0800
[YUNIKORN-2886] update Spark operator documentation for YuniKorn
integration (#474)
* [YUNIKORN-2886] update Spark operator documentation for YuniKorn
integration
Co-authored-by: 陳昱霖(Yu-Lin Chen) <[email protected]>
Co-authored-by: Chaoran Yu <[email protected]>
---
docs/user_guide/workloads/run_spark.md | 82 +++++++++++++++++++++++++++++++---
1 file changed, 77 insertions(+), 5 deletions(-)
diff --git a/docs/user_guide/workloads/run_spark.md
b/docs/user_guide/workloads/run_spark.md
index 290fada90b..fff6e1b2c2 100644
--- a/docs/user_guide/workloads/run_spark.md
+++ b/docs/user_guide/workloads/run_spark.md
@@ -25,12 +25,84 @@ specific language governing permissions and limitations
under the License.
-->
+## Run a Spark job with Spark Operator
+
+:::note
+Pre-requisites:
+- This tutorial assumes YuniKorn is
[installed](../../get_started/get_started.md) under the namespace `yunikorn`
+- Use spark-operator version >= 2.0 to enable support for YuniKorn gang
scheduling
+:::
+
+### Install YuniKorn
+
+A simple script to install YuniKorn under the namespace `yunikorn`, refer to
[Get Started](../../get_started/get_started.md) for more details.
+
+```shell script
+helm repo add yunikorn https://apache.github.io/yunikorn-release
+helm repo update
+helm install yunikorn yunikorn/yunikorn --create-namespace --namespace yunikorn
+```
+
+### Install Spark Operator
+
+We should install `spark-operator` with
`controller.batchScheduler.enable=true` and set
`controller.batchScheduler.default=yunikorn` to enable Gang Scheduling. It's
optional to set the default scheduler to YuniKorn since you can specify it
later on, but it's recommended to do so.
+Also, note that the total requested memory for the Spark job is the sum of
memory requested for the driver and that for all executors, where each is
computed as below:
+* Driver requested memory = `spark.driver.memory` +
`spark.driver.memoryOverhead`
+* Executor requested memory = `spark.executor.memory` +
`spark.executor.memoryOverhead` + `spark.executor.pyspark.memory`
+
+```shell script
+helm repo add spark-operator https://kubeflow.github.io/spark-operator
+helm repo update
+helm install spark-operator spark-operator/spark-operator \
+ --create-namespace \
+ --namespace spark-operator \
+ --set controller.batchScheduler.enable=true \
+ --set controller.batchScheduler.default=yunikorn
+```
+
+### Create an example application
+
+Create a Spark application to run a sample Spark Pi job.
+
+```shell script
+cat <<EOF | kubectl apply -f -
+apiVersion: sparkoperator.k8s.io/v1beta2
+kind: SparkApplication
+metadata:
+ name: spark-pi-yunikorn
+ namespace: default
+spec:
+ type: Scala
+ mode: cluster
+ image: spark:3.5.2
+ imagePullPolicy: IfNotPresent
+ mainClass: org.apache.spark.examples.SparkPi
+ mainApplicationFile:
local:///opt/spark/examples/jars/spark-examples_2.12-3.5.2.jar
+ sparkVersion: 3.5.2
+ driver:
+ cores: 1
+ memory: 512m
+ serviceAccount: spark-operator-spark # default service account created by
spark operator
+ executor:
+ instances: 2
+ cores: 1
+ memory: 512m
+ batchScheduler: yunikorn
+ batchSchedulerOptions:
+ queue: root.default
+EOF
+```
+
+To view the specifics, visit [Spark operator official
documentation](https://www.kubeflow.org/docs/components/spark-operator/user-guide/yunikorn-integration/).
+
+## Deploy Spark job using Spark submit
+
:::note
This document assumes you have YuniKorn and its admission-controller both
installed. Please refer to
[get started](../../get_started/get_started.md) to see how that is done.
:::
-## Prepare the docker image for Spark
+### Prepare the docker image for Spark
To run Spark on Kubernetes, you'll need the Spark docker images. You can 1)
use the docker images provided by the Spark
team, or 2) build one from scratch.
@@ -46,7 +118,7 @@ in the Spark documentation. Simplified steps:
Recommendation is to use the official images with different spark versions in
the [dockerhub](https://hub.docker.com/r/apache/spark/tags)
-## Create a namespace for Spark jobs
+### Create a namespace for Spark jobs
Create a namespace:
@@ -59,7 +131,7 @@ metadata:
EOF
```
-## Create service account and role binding
+### Create service account and role binding
Create service account and role bindings inside the `spark-test` namespace:
@@ -105,7 +177,7 @@ Do NOT use `ClusterRole` and `ClusterRoleBinding` to run
Spark jobs in productio
security context for running Spark jobs. See more about how to configure
proper RBAC rules
[here](https://kubernetes.io/docs/reference/access-authn-authz/rbac/).
:::
-## Submit a Spark job
+### Submit a Spark job
If this is running from local machine, you will need to start the proxy in
order to talk to the api-server.
@@ -150,7 +222,7 @@ The spark-pi result is in the driver pod.

-## What happens behind the scenes?
+### What happens behind the scenes?
When the Spark job is submitted to the cluster, the job is submitted to
`spark-test` namespace. The Spark driver pod will
be firstly created under this namespace. Since this cluster has YuniKorn
admission-controller enabled, when the driver pod
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]