This is an automated email from the ASF dual-hosted git repository.
mmack pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git
The following commit(s) were added to refs/heads/master by this push:
new 0b4302e5f95 [Documentation] Update docs to run SparkPipelineRunner on
a Kubernetes cluster (closes #27984)
0b4302e5f95 is described below
commit 0b4302e5f95f2dc9b6658c13d5d1aa798cfba668
Author: Hao Xu <[email protected]>
AuthorDate: Fri Sep 1 05:33:12 2023 -0700
[Documentation] Update docs to run SparkPipelineRunner on a Kubernetes
cluster (closes #27984)
---
.../site/content/en/documentation/runners/spark.md | 47 +++++++++++++++++++++-
1 file changed, 45 insertions(+), 2 deletions(-)
diff --git a/website/www/site/content/en/documentation/runners/spark.md
b/website/www/site/content/en/documentation/runners/spark.md
index dcc166873dc..29ef5c28102 100644
--- a/website/www/site/content/en/documentation/runners/spark.md
+++ b/website/www/site/content/en/documentation/runners/spark.md
@@ -487,5 +487,48 @@ Provided SparkContext and StreamingListeners are not
supported on the Spark port
{{< /paragraph >}}
### Kubernetes
-
-An [example](https://github.com/cometta/python-apache-beam-spark) of
configuring Spark to run Apache beam job
+#### Submit beam job without job server
+To submit a beam job directly on spark kubernetes cluster without spinning up
an extra job server, you can do:
+```
+spark-submit --master MASTER_URL \
+ --conf spark.kubernetes.driver.podTemplateFile=driver_pod_template.yaml \
+ --conf spark.kubernetes.executor.podTemplateFile=executor_pod_template.yaml \
+ --class org.apache.beam.runners.spark.SparkPipelineRunner \
+ --conf spark.kubernetes.container.image=apache/spark:v3.3.2 \
+ ./wc_job.jar
+```
+Similar to run the beam job on Dataproc, you can bundle the job jar like
below. The example use the `PROCESS` type of [SDK
harness](https://beam.apache.org/documentation/runtime/sdk-harness-config/) to
execute the job by processes.
+```
+python -m beam_example_wc \
+ --runner=SparkRunner \
+ --output_executable_path=./wc_job.jar \
+ --environment_type=PROCESS \
+ --environment_config='{\"command\": \"/opt/apache/beam/boot\"}' \
+ --spark_version=3
+```
+
+And below is an example of kubernetes executor pod template, the
`initContainer` is required to download the beam SDK harness to run the beam
pipelines.
+```
+spec:
+ containers:
+ - name: spark-kubernetes-executor
+ volumeMounts:
+ - name: beam-data
+ mountPath: /opt/apache/beam/
+ initContainers:
+ - name: init-beam
+ image: apache/beam_python3.7_sdk
+ command:
+ - cp
+ - /opt/apache/beam/boot
+ - /init-container/data/boot
+ volumeMounts:
+ - name: beam-data
+ mountPath: /init-container/data
+ volumes:
+ - name: beam-data
+ emptyDir: {}
+```
+
+#### Submit beam job with job server
+An [example](https://github.com/cometta/python-apache-beam-spark) of
configuring Spark to run Apache beam job with a job server.