This is an automated email from the ASF dual-hosted git repository.
wuchunfu pushed a commit to branch dev
in repository https://gitbox.apache.org/repos/asf/seatunnel.git
The following commit(s) were added to refs/heads/dev by this push:
new 156b087618 [Improve][Doc] Spark use doc update (#7722)
156b087618 is described below
commit 156b08761885e66c16036becb0900d0a3b108694
Author: Carl-Zhou-CN <[email protected]>
AuthorDate: Fri Sep 27 11:27:22 2024 +0800
[Improve][Doc] Spark use doc update (#7722)
Co-authored-by: Jia Fan <[email protected]>
---
docs/en/other-engine/spark.md | 108 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 108 insertions(+)
diff --git a/docs/en/other-engine/spark.md b/docs/en/other-engine/spark.md
index e69de29bb2..0d821defed 100644
--- a/docs/en/other-engine/spark.md
+++ b/docs/en/other-engine/spark.md
@@ -0,0 +1,108 @@
+# SeaTunnel Runs On Spark
+
+Spark is a powerful high-performance distributed calculate processing engine.
More information about it you can search for `Apache Spark`
+
+### Set Spark Configuration Information In The Job
+
+Example:
+I set some spark conf for this job
+
+```
+env {
+ spark.app.name = "example"
+ spark.sql.catalogImplementation = "hive"
+ spark.executor.memory= "2g"
+ spark.executor.instances = "2"
+ spark.yarn.priority = "100'
+ hive.exec.dynamic.partition.mode = "nonstrict"
+ spark.dynamicAllocation.enabled="false"
+}
+```
+
+### Command Line Example
+
+#### Spark on Yarn Cluster
+
+```
+./bin/start-seatunnel-spark-3-connector-v2.sh --master yarn --deploy-mode
cluster --config config/example.conf
+```
+
+#### Spark on Yarn Cluster
+
+```
+./bin/start-seatunnel-spark-3-connector-v2.sh --master yarn --deploy-mode
client --config config/example.conf
+```
+
+### How To Set Up A Simple Spark Job
+
+This is a simple job that runs on Spark. Randomly generated data is printed to
the console
+
+```
+env {
+ # common parameter
+ parallelism = 1
+
+ # spark special parameter
+ spark.app.name = "example"
+ spark.sql.catalogImplementation = "hive"
+ spark.executor.memory= "2g"
+ spark.executor.instances = "1"
+ spark.yarn.priority = "100"
+ hive.exec.dynamic.partition.mode = "nonstrict"
+ spark.dynamicAllocation.enabled="false"
+}
+
+source {
+ FakeSource {
+ schema = {
+ fields {
+ c_map = "map<string, array<int>>"
+ c_array = "array<int>"
+ c_string = string
+ c_boolean = boolean
+ c_tinyint = tinyint
+ c_smallint = smallint
+ c_int = int
+ c_bigint = bigint
+ c_float = float
+ c_double = double
+ c_decimal = "decimal(30, 8)"
+ c_null = "null"
+ c_bytes = bytes
+ c_date = date
+ c_timestamp = timestamp
+ c_row = {
+ c_map = "map<string, map<string, string>>"
+ c_array = "array<int>"
+ c_string = string
+ c_boolean = boolean
+ c_tinyint = tinyint
+ c_smallint = smallint
+ c_int = int
+ c_bigint = bigint
+ c_float = float
+ c_double = double
+ c_decimal = "decimal(30, 8)"
+ c_null = "null"
+ c_bytes = bytes
+ c_date = date
+ c_timestamp = timestamp
+ }
+ }
+ }
+}
+}
+
+transform {
+ # If you would like to get more information about how to configure seatunnel
and see full list of transform plugins,
+ # please go to https://seatunnel.apache.org/docs/transform-v2/sql
+}
+
+sink{
+ Console{}
+}
+```
+
+### How To Run A Job In A Project
+
+After you pull the code to the local, go to the
`seatunnel-examples/seatunnel-spark-connector-v2-example` module and find
`org.apache.seatunnel.example.spark.v2.SeaTunnelApiExample` to complete the
operation of the job.