This is an automated email from the ASF dual-hosted git repository.
lidongdai pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/seatunnel-website.git
The following commit(s) were added to refs/heads/main by this push:
new 323d0956c3b Update kubernetes.mdx and add spark on kubernetes (#273)
323d0956c3b is described below
commit 323d0956c3b929aa3152d431ece9eef5ed2278db
Author: Hao Xu <[email protected]>
AuthorDate: Sat Jan 6 01:58:53 2024 -0800
Update kubernetes.mdx and add spark on kubernetes (#273)
* Update kubernetes.mdx
There are some outdated resources in this page.
* add kubernetes for spark engine
* fix the issue
* Update kubernetes.mdx
---------
Co-authored-by: David Zollo <[email protected]>
---
.../start-v2/kubernetes/kubernetes.mdx | 271 ++++++++++++++-------
1 file changed, 179 insertions(+), 92 deletions(-)
diff --git a/versioned_docs/version-2.3.3/start-v2/kubernetes/kubernetes.mdx
b/versioned_docs/version-2.3.3/start-v2/kubernetes/kubernetes.mdx
index 641793cdac3..00156e4cd71 100644
--- a/versioned_docs/version-2.3.3/start-v2/kubernetes/kubernetes.mdx
+++ b/versioned_docs/version-2.3.3/start-v2/kubernetes/kubernetes.mdx
@@ -36,33 +36,64 @@ To run the image with SeaTunnel, first create a
`Dockerfile`:
defaultValue="Zeta (local-mode)"
values={[
{label: 'Flink', value: 'flink'},
+ {label: 'Spark', value: 'spark'},
{label: 'Zeta (local-mode)', value: 'Zeta (local-mode)'},
{label: 'Zeta (cluster-mode)', value: 'Zeta (cluster-mode)'},
]}>
<TabItem value="flink">
```Dockerfile
-FROM flink:1.13
+FROM apache/flink:1.13
-ENV SEATUNNEL_VERSION="2.3.2"
+ENV SEATUNNEL_VERSION="2.3.3"
ENV SEATUNNEL_HOME="/opt/seatunnel"
RUN wget
https://dlcdn.apache.org/seatunnel/${SEATUNNEL_VERSION}/apache-seatunnel-${SEATUNNEL_VERSION}-bin.tar.gz
RUN tar -xzvf apache-seatunnel-${SEATUNNEL_VERSION}-bin.tar.gz
RUN mv apache-seatunnel-${SEATUNNEL_VERSION} ${SEATUNNEL_HOME}
-RUN cd ${SEATUNNEL_HOME}||sh bin/install-plugin.sh ${SEATUNNEL_VERSION}
+RUN cd ${SEATUNNEL_HOME} && bash bin/install-plugin.sh ${SEATUNNEL_VERSION}
+```
+
+Then run the following commands to build the image:
+```bash
+docker build -t seatunnel:2.3.3-flink-1.13 -f Dockerfile .
+```
+Image `seatunnel:2.3.3-flink-1.13` need to be present in the host (minikube)
so that the deployment can take place.
+
+Load image to minikube via:
+```bash
+minikube image load seatunnel:2.3.3-flink-1.13
+```
+
+</TabItem>
+
+<TabItem value="spark">
+
+```Dockerfile
+FROM apache/spark:3.3.3-scala2.12-java11-ubuntu
+
+USER root
+
+ENV SEATUNNEL_VERSION="2.3.3"
+ENV SEATUNNEL_HOME="/opt/seatunnel"
+
+RUN wget
https://dlcdn.apache.org/seatunnel/${SEATUNNEL_VERSION}/apache-seatunnel-${SEATUNNEL_VERSION}-bin.tar.gz
+RUN tar -xzvf apache-seatunnel-${SEATUNNEL_VERSION}-bin.tar.gz
+RUN mv apache-seatunnel-${SEATUNNEL_VERSION} ${SEATUNNEL_HOME}
+
+RUN cd ${SEATUNNEL_HOME} && bash bin/install-plugin.sh ${SEATUNNEL_VERSION}
```
Then run the following commands to build the image:
```bash
-docker build -t seatunnel:2.3.0-flink-1.13 -f Dockerfile .
+docker build -t seatunnel:2.3.3-spark-3.3.3 -f Dockerfile .
```
-Image `seatunnel:2.3.0-flink-1.13` need to be present in the host (minikube)
so that the deployment can take place.
+Image `seatunnel:2.3.3-spark-3.3.3` need to be present in the host (minikube)
so that the deployment can take place.
Load image to minikube via:
```bash
-minikube image load seatunnel:2.3.0-flink-1.13
+minikube image load seatunnel:2.3.3-spark-3.3.3
```
</TabItem>
@@ -132,6 +163,7 @@ minikube image load seatunnel:2.3.3
defaultValue="Zeta (local-mode)"
values={[
{label: 'Flink', value: 'flink'},
+ {label: 'Spark', value: 'spark'},
{label: 'Zeta (local-mode)', value: 'Zeta (local-mode)'},
{label: 'Zeta (cluster-mode)', value: 'Zeta (cluster-mode)'},
]}>
@@ -150,10 +182,10 @@ kubectl create -f
https://github.com/jetstack/cert-manager/releases/download/v1.
Now you can deploy the latest stable Flink Kubernetes Operator version using
the included Helm chart:
```bash
-helm repo add flink-operator-repo
https://downloads.apache.org/flink/flink-kubernetes-operator-1.3.1/
+helm repo add flink-kubernetes-operator-1.3.1
https://archive.apache.org/dist/flink/flink-kubernetes-operator-1.3.1/
-helm install flink-kubernetes-operator
flink-operator-repo/flink-kubernetes-operator \
---set image.repository=apache/flink-kubernetes-operator
+helm install flink-kubernetes-operator
flink-kubernetes-operator-1.3.1/flink-kubernetes-operator \
+--set webhook.create=false --set
image.repository=apache/flink-kubernetes-operator
```
You may verify your installation via `kubectl`:
@@ -167,6 +199,36 @@ flink-kubernetes-operator-5f466b8549-mgchb 1/1
Running 3 (23h
</TabItem>
+<TabItem value="spark">
+
+The steps below provide a quick walk-through on setting up the Spark
Kubernetes Operator.
+You can refer to [Spark Kubernetes Operator - Quick
Start](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/quick-start-guide.md)
for more details.
+
+> Notice: All the Kubernetes resources bellow are created in default namespace.
+
+Install the certificate manager on your Kubernetes cluster to enable adding
the webhook component (only needed once per Kubernetes cluster):
+
+```bash
+kubectl create -f
https://github.com/jetstack/cert-manager/releases/download/v1.8.2/cert-manager.yaml
+```
+Now you can deploy the latest stable spark Kubernetes Operator version using
the included Helm chart:
+
+```bash
+helm repo add spark-operator
https://googlecloudplatform.github.io/spark-on-k8s-operator
+
+helm install seatunnel spark-operator/spark-operator \
+--set webhook.enable=true --set serviceAccounts.spark.name=spark
+```
+
+You may verify your installation via `kubectl`:
+
+```bash
+kubectl get pods
+NAME READY STATUS RESTARTS AGE
+seatunnel-spark-operator-57d966fdfc-v2mmw 1/1 Running 0 17s
+
+```
+</TabItem>
<TabItem value="Zeta (local-mode)">
none
@@ -181,16 +243,6 @@ none
**Run Application:**: SeaTunnel already providers out-of-the-box
[configurations](https://github.com/apache/seatunnel/tree/dev/config).
-<Tabs
- groupId="engine-type"
- defaultValue="Zeta (local-mode)"
- values={[
- {label: 'Flink', value: 'flink'},
- {label: 'Zeta (local-mode)', value: 'Zeta (local-mode)'},
- {label: 'Zeta (cluster-mode)', value: 'Zeta (cluster-mode)'},
- ]}>
-<TabItem value="flink">
-
In this guide we are going to use
[seatunnel.streaming.conf](https://github.com/apache/seatunnel/blob/2.3.0-release/config/v2.streaming.conf.template):
```conf
@@ -237,6 +289,18 @@ kubectl create cm seatunnel-config \
--from-file=seatunnel.streaming.conf=seatunnel.streaming.conf
```
+<Tabs
+ groupId="engine-type"
+ defaultValue="flink"
+ values={[
+ {label: 'Flink', value: 'flink'},
+ {label: 'Spark', value: 'spark'},
+ {label: 'Zeta (local-mode)', value: 'Zeta (local-mode)'},
+ {label: 'Zeta (cluster-mode)', value: 'Zeta (cluster-mode)'},
+ ]}>
+
+<TabItem value="flink">
+
Once the Flink Kubernetes Operator is running as seen in the previous steps
you are ready to submit a Flink (SeaTunnel) job:
- Create `seatunnel-flink.yaml` FlinkDeployment manifest:
```yaml
@@ -245,7 +309,7 @@ kind: FlinkDeployment
metadata:
name: seatunnel-flink-streaming-example
spec:
- image: seatunnel:2.3.0-flink-1.13
+ image: seatunnel:2.3.3-flink-1.13
flinkVersion: v1_13
flinkConfiguration:
taskmanager.numberOfTaskSlots: "2"
@@ -289,42 +353,66 @@ kubectl apply -f seatunnel-flink.yaml
</TabItem>
-<TabItem value="Zeta (local-mode)">
+<TabItem value="spark">
-In this guide we are going to use
[seatunnel.streaming.conf](https://github.com/apache/seatunnel/blob/2.3.0-release/config/v2.streaming.conf.template):
-
-```conf
-env {
- execution.parallelism = 2
- job.mode = "STREAMING"
- checkpoint.interval = 2000
-}
-
-source {
- FakeSource {
- parallelism = 2
- result_table_name = "fake"
- row.num = 16
- schema = {
- fields {
- name = "string"
- age = "int"
- }
- }
- }
-}
-
-sink {
- Console {
- }
-}
+Once the Spark Kubernetes Operator is running as seen in the previous steps
you are ready to submit a Spark (SeaTunnel) job:
+- Create `seatunnel-spark.yaml` Spark Deployment manifest:
+```yaml
+apiVersion: "sparkoperator.k8s.io/v1beta2"
+kind: SparkApplication
+metadata:
+ name: seatunnel-spark-streaming-example
+spec:
+ type: Java
+ mode: cluster
+ image: seatunnel:2.3.3-spark-3.3.3
+ imagePullPolicy: Always
+ mainClass: org.apache.seatunnel.core.starter.spark.SeaTunnelSpark
+ mainApplicationFile:
"local:///opt/seatunnel/starter/seatunnel-spark-3-starter.jar"
+ arguments: ["--config", "/data/seatunnel.streaming.conf"]
+ sparkVersion: "3.3.3"
+ restartPolicy:
+ type: Never
+ volumes:
+ - name: "test-volume"
+ hostPath:
+ path: "/tmp"
+ type: Directory
+ driver:
+ cores: 1
+ coreLimit: "1200m"
+ memory: "512m"
+ labels:
+ version: 3.1.1
+ serviceAccount: spark
+ configMaps:
+ - name: seatunnel-config
+ path: seatunnel.streaming.conf
+ volumeMounts:
+ - name: "test-volume"
+ mountPath: "/tmp"
+ executor:
+ cores: 1
+ instances: 1
+ memory: "512m"
+ labels:
+ version: 3.1.1
+ configMaps:
+ - name: seatunnel-config
+ path: seatunnel.streaming.conf
+ volumeMounts:
+ - name: "test-volume"
+ mountPath: "/tmp"
```
-Generate a configmap named seatunnel-config in Kubernetes for the
seatunnel.streaming.conf so that we can mount the config content in pod.
+- Run the example application:
```bash
-kubectl create cm seatunnel-config \
---from-file=seatunnel.streaming.conf=seatunnel.streaming.conf
+kubectl apply -f seatunnel-spark.yaml
```
+</TabItem>
+
+<TabItem value="Zeta (local-mode)">
+
- Create `seatunnel.yaml`:
```yaml
apiVersion: v1
@@ -360,47 +448,10 @@ spec:
```bash
kubectl apply -f seatunnel.yaml
```
-
</TabItem>
-
<TabItem value="Zeta (cluster-mode)">
-In this guide we are going to use
[seatunnel.streaming.conf](https://github.com/apache/seatunnel/blob/2.3.0-release/config/v2.streaming.conf.template):
-
-```conf
-env {
- execution.parallelism = 2
- job.mode = "STREAMING"
- checkpoint.interval = 2000
-}
-
-source {
- FakeSource {
- parallelism = 2
- result_table_name = "fake"
- row.num = 16
- schema = {
- fields {
- name = "string"
- age = "int"
- }
- }
- }
-}
-
-sink {
- Console {
- }
-}
-```
-
-Generate a configmap named seatunnel-config in Kubernetes for the
seatunnel.streaming.conf so that we can mount the config content in pod.
-```bash
-kubectl create cm seatunnel-config \
---from-file=seatunnel.streaming.conf=seatunnel.streaming.conf
-```
-
Then, we use the following command to load some configuration files used by
the seatunnel cluster into the configmap
Create the yaml file locally as follows
@@ -619,13 +670,14 @@ kubectl exec -it seatunnel-0 --
/opt/seatunnel/bin/seatunnel.sh --config /data
</Tabs>
-**See The Output**
+## See The Output
<Tabs
groupId="engine-type"
defaultValue="Zeta (local-mode)"
values={[
{label: 'Flink', value: 'flink'},
+ {label: 'Spark', value: 'spark'},
{label: 'Zeta (local-mode)', value: 'Zeta (local-mode)'},
{label: 'Zeta (cluster-mode)', value: 'Zeta (cluster-mode)'},
]}>
@@ -686,12 +738,47 @@ kubectl delete -f seatunnel-flink.yaml
```
</TabItem>
+<TabItem value="spark">
+
+You may follow the logs of your job, after a successful startup (which can
take on the order of a minute in a fresh environment, seconds afterwards) you
can:
+
+```bash
+kubectl logs -f seatunnel-spark-example-driver
+```
+looks like the below:
+
+```shell
+...
+23/11/13 04:52:45 INFO SparkContext: Running Spark version 3.3.3
+23/11/13 04:52:45 INFO ResourceUtils:
==============================================================
+23/11/13 04:52:45 INFO ResourceUtils: No custom resources configured for
spark.driver.
+23/11/13 04:52:45 INFO ResourceUtils:
==============================================================
+23/11/13 04:52:45 INFO SparkContext: Submitted application: SeaTunnel
+23/11/13 04:52:45 INFO ResourceProfile: Default ResourceProfile created,
executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: ,
memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name:
offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name:
cpus, amount: 1.0)
+23/11/13 04:52:45 INFO ResourceProfile: Limiting resource is cpus at 1 tasks
per executor
+23/11/13 04:52:45 INFO ResourceProfileManager: Added ResourceProfile id: 0
+```
+
+
+To expose the Spark UI you may add a port-forward rule:
+```bash
+kubectl port-forward svc/seatunnel-spark-example-ui-svc 4040
+```
+Now the Spark UI is accessible at [localhost:4040](http://localhost:4040).
+
+To stop your job and delete your Spark application you can simply:
+
+```bash
+kubectl delete -f seatunnel-spark.yaml
+```
+</TabItem>
+
<TabItem value="Zeta (local-mode)">
You may follow the logs of your job, after a successful startup (which can
take on the order of a minute in a fresh environment, seconds afterwards) you
can:
```bash
-kubectl logs -f seatunnel
+kubectl logs -f seatunnel
```
looks like the below (your content may be different since we use `FakeSource`
to automatically generate random stream data):
@@ -723,7 +810,7 @@ looks like the below (your content may be different since
we use `FakeSource` to
```
-To stop your job and delete your FlinkDeployment you can simply:
+To stop your job and delete your SeaTunnel Deployment you can simply:
```bash
kubectl delete -f seatunnel.yaml
@@ -755,7 +842,7 @@ looks like the below (your content may be different since
we use `FakeSource` to
```
-To stop your job and delete your FlinkDeployment you can simply:
+To stop your job and delete your SeaTunnel Deployment you can simply:
```bash
kubectl delete -f seatunnel-cluster.yaml