[incubator-seatunnel] branch dev updated: [Docs] Improve document (#3768)

tyrantlucifer Mon, 19 Dec 2022 05:11:11 -0800

This is an automated email from the ASF dual-hosted git repository.

tyrantlucifer pushed a commit to branch dev
in repository https://gitbox.apache.org/repos/asf/incubator-seatunnel.git



The following commit(s) were added to refs/heads/dev by this push:
     new 705625c17 [Docs] Improve document (#3768)
705625c17 is described below

commit 705625c178210cba01a9e49265c2f9c366930594
Author: Eric <[email protected]>
AuthorDate: Mon Dec 19 21:10:13 2022 +0800

    [Docs] Improve document (#3768)
---
 docs/en/connector-v2/sink/Kafka.md                 |  57 +++++++
 docs/en/connector-v2/source/kafka.md               |  55 +++++++
 docs/en/seatunnel-engine/checkpoint-storage.md     |   7 +-
 docs/en/seatunnel-engine/cluster-manager.md        |   5 +
 docs/en/seatunnel-engine/cluster-mode.md           |   9 +-
 docs/en/seatunnel-engine/deployment.md             | 177 ++++++++++++++++++++-
 docs/en/seatunnel-engine/local-mode.md             |   5 +-
 docs/en/seatunnel-engine/tcp.md                    |  36 +++++
 docs/sidebars.js                                   |   4 +-
 plugins/README.md                                  |   9 +-
 .../json/files/schemas/seatunnel-schema-demo.json  |   4 -
 11 files changed, 356 insertions(+), 12 deletions(-)

diff --git a/docs/en/connector-v2/sink/Kafka.md 
b/docs/en/connector-v2/sink/Kafka.md
index f86f6b25a..aa3f3b54b 100644
--- a/docs/en/connector-v2/sink/Kafka.md
+++ b/docs/en/connector-v2/sink/Kafka.md
@@ -120,6 +120,63 @@ sink {
 }
 ```
 
+### AWS MSK SASL/SCRAM
+
+Replace the following `${username}` and `${password}` with the configuration 
values in AWS MSK.
+
+```hocon
+sink {
+  kafka {
+      topic = "seatunnel"
+      bootstrap.servers = "localhost:9092"
+      partition = 3
+      format = json
+      kafka.request.timeout.ms = 60000
+      semantics = EXACTLY_ONCE
+      kafka.security.protocol=SASL_SSL
+      kafka.sasl.mechanism=SCRAM-SHA-512
+      
kafka.sasl.jaas.config="org.apache.kafka.common.security.scram.ScramLoginModule 
required \nusername=${username}\npassword=${password};"
+  }
+  
+}
+```
+
+### AWS MSK IAM
+
+Download `aws-msk-iam-auth-1.1.5.jar` from 
https://github.com/aws/aws-msk-iam-auth/releases and put it in 
`$SEATUNNEL_HOME/plugin/kafka/lib` dir.
+
+Please ensure the IAM policy have `"kafka-cluster:Connect",`. Like this:
+
+
+```hocon
+"Effect": "Allow",
+"Action": [
+    "kafka-cluster:Connect",
+    "kafka-cluster:AlterCluster",
+    "kafka-cluster:DescribeCluster"
+],
+```
+
+Sink Config
+
+```hocon
+sink {
+  kafka {
+      topic = "seatunnel"
+      bootstrap.servers = "localhost:9092"
+      partition = 3
+      format = json
+      kafka.request.timeout.ms = 60000
+      semantics = EXACTLY_ONCE
+      kafka.security.protocol=SASL_SSL
+      kafka.sasl.mechanism=AWS_MSK_IAM
+      kafka.sasl.jaas.config="software.amazon.msk.auth.iam.IAMLoginModule 
required;"
+      
kafka.sasl.client.callback.handler.class="software.amazon.msk.auth.iam.IAMClientCallbackHandler"
+  }
+  
+}
+```
+
 ## Changelog
 
 ### 2.3.0-beta 2022-10-20
diff --git a/docs/en/connector-v2/source/kafka.md 
b/docs/en/connector-v2/source/kafka.md
index 2b6596173..96af58081 100644
--- a/docs/en/connector-v2/source/kafka.md
+++ b/docs/en/connector-v2/source/kafka.md
@@ -145,6 +145,61 @@ source {
 }
 ```
 
+### AWS MSK SASL/SCRAM
+
+Replace the following `${username}` and `${password}` with the configuration 
values in AWS MSK.
+
+```hocon
+source {
+    Kafka {
+        topic = "seatunnel"
+        bootstrap.servers = 
"xx.amazonaws.com.cn:9096,xxx.amazonaws.com.cn:9096,xxxx.amazonaws.com.cn:9096"
+        consumer.group = "seatunnel_group"
+        kafka.security.protocol=SASL_SSL
+        kafka.sasl.mechanism=SCRAM-SHA-512
+        
kafka.sasl.jaas.config="org.apache.kafka.common.security.scram.ScramLoginModule 
required \nusername=${username}\npassword=${password};"
+        #kafka.security.protocol=SASL_SSL
+        #kafka.sasl.mechanism=AWS_MSK_IAM
+        #kafka.sasl.jaas.config="software.amazon.msk.auth.iam.IAMLoginModule 
required;"
+        
#kafka.sasl.client.callback.handler.class="software.amazon.msk.auth.iam.IAMClientCallbackHandler"
+    }
+}
+```
+
+### AWS MSK IAM
+
+Download `aws-msk-iam-auth-1.1.5.jar` from 
https://github.com/aws/aws-msk-iam-auth/releases and put it in 
`$SEATUNNEL_HOME/plugin/kafka/lib` dir.
+
+Please ensure the IAM policy have `"kafka-cluster:Connect",`. Like this:
+
+```hocon
+"Effect": "Allow",
+"Action": [
+    "kafka-cluster:Connect",
+    "kafka-cluster:AlterCluster",
+    "kafka-cluster:DescribeCluster"
+],
+```
+
+Source Config
+
+```hocon
+source {
+    Kafka {
+        topic = "seatunnel"
+        bootstrap.servers = 
"xx.amazonaws.com.cn:9098,xxx.amazonaws.com.cn:9098,xxxx.amazonaws.com.cn:9098"
+        consumer.group = "seatunnel_group"
+        #kafka.security.protocol=SASL_SSL
+        #kafka.sasl.mechanism=SCRAM-SHA-512
+        
#kafka.sasl.jaas.config="org.apache.kafka.common.security.scram.ScramLoginModule
 required \nusername=${username}\npassword=${password};"
+        kafka.security.protocol=SASL_SSL
+        kafka.sasl.mechanism=AWS_MSK_IAM
+        kafka.sasl.jaas.config="software.amazon.msk.auth.iam.IAMLoginModule 
required;"
+        
kafka.sasl.client.callback.handler.class="software.amazon.msk.auth.iam.IAMClientCallbackHandler"
+    }
+}
+```
+
 ## Changelog
 
 ### 2.3.0-beta 2022-10-20
diff --git a/docs/en/seatunnel-engine/checkpoint-storage.md 
b/docs/en/seatunnel-engine/checkpoint-storage.md
index de681b023..31fb4a6ba 100644
--- a/docs/en/seatunnel-engine/checkpoint-storage.md
+++ b/docs/en/seatunnel-engine/checkpoint-storage.md
@@ -1,3 +1,7 @@
+---
+sidebar_position: 7
+---
+
 # Checkpoint Storage
 ## Introduction
 Checkpoint is a fault-tolerant recovery mechanism. This mechanism ensures that 
when the program is running, it can recover itself even if it suddenly 
encounters an exception.
@@ -56,6 +60,7 @@ seatunnel:
                 plugin-config:
                     storage-type: s3
                     s3.bucket: your-bucket
+                    fs.s3a.endpoint: your-endpoint
                     fs.s3a.access-key: your-access-key
                     fs.s3a.secret-key: your-secret-key
                     fs.s3a.aws.credentials.provider: 
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
@@ -116,6 +121,6 @@ seatunnel:
         max-retained: 3
         plugin-config:
           storage-type: hdfs
-            fs.defaultFS: /tmp/ # Ensure that the directory has written 
permission 
+          fs.defaultFS: /tmp/ # Ensure that the directory has written 
permission 
 
 ```
diff --git a/docs/en/seatunnel-engine/cluster-manager.md 
b/docs/en/seatunnel-engine/cluster-manager.md
new file mode 100644
index 000000000..bd13967e1
--- /dev/null
+++ b/docs/en/seatunnel-engine/cluster-manager.md
@@ -0,0 +1,5 @@
+---
+sidebar_position: 5
+---
+
+# SeaTunnel Engine Cluster Manager
\ No newline at end of file
diff --git a/docs/en/seatunnel-engine/cluster-mode.md 
b/docs/en/seatunnel-engine/cluster-mode.md
index e7da4dabd..8a7a72106 100644
--- a/docs/en/seatunnel-engine/cluster-mode.md
+++ b/docs/en/seatunnel-engine/cluster-mode.md
@@ -1,6 +1,13 @@
 ---
-sidebar_position: 4
+sidebar_position: 3
 ---
 
 # Run Job With Cluster Mode
 
+This is the most recommended way to use SeaTunnel Engine in the production 
environment. Full functionality of SeaTunnel Engine is supported in this mode 
and the cluster mode will have better performance and stability.
+
+In the cluster mode, the SeaTunnel Engine cluster needs to be deployed first, 
and the client will submit the job to the SeaTunnel Engine cluster for running.
+
+## Deploy SeaTunnel Engine Cluster
+
+Deploy a SeaTunnel Engine Cluster reference [SeaTunnel Engine Cluster 
Deploy](deployment.md)
\ No newline at end of file
diff --git a/docs/en/seatunnel-engine/deployment.md 
b/docs/en/seatunnel-engine/deployment.md
index 20878057f..7deb082a8 100644
--- a/docs/en/seatunnel-engine/deployment.md
+++ b/docs/en/seatunnel-engine/deployment.md
@@ -1,8 +1,183 @@
 ---
-sidebar_position: 2
+sidebar_position: 4
 ---
 
 # Deployment SeaTunnel Engine
 
+## 1. Download
+
 SeaTunnel Engine is the default engine of SeaTunnel. The installation package 
of SeaTunnel already contains all the contents of SeaTunnel Engine.
 
+## 2 Config SEATUNNEL_HOME
+
+You can config `SEATUNNEL_HOME` by add `/etc/profile.d/seatunnel.sh` file. The 
content of `/etc/profile.d/seatunnel.sh` are
+
+```
+export SEATUNNEL_HOME=${seatunnel install path}
+export PATH=$PATH:$SEATUNNEL_HOME/bin
+```
+
+## 3. Config SeaTunnel Engine JVM options
+
+SeaTunnel Engine supported two ways to set jvm options.
+
+1. Add JVM Options to `$SEATUNNEL_HOME/bin/seatunnel-cluster.sh`.
+
+   Modify the `$SEATUNNEL_HOME/bin/seatunnel-cluster.sh` file and add 
`JAVA_OPTS="-Xms2G -Xmx2G"` in the first line.
+2. Add JVM Options when start SeaTunnel Engine. For example 
`seatunnel-cluster.sh -DJvmOption="-Xms2G -Xmx2G"`
+
+## 4. Config SeaTunnel Engine
+
+SeaTunnel Engine provides many functions, which need to be configured in 
seatunnel.yaml.
+
+### 4.1 Backup count
+
+SeaTunnel Engine implement cluster management based on [Hazelcast 
IMDG](https://docs.hazelcast.com/imdg/4.1/). The state data of cluster(Job 
Running State, Resource State) are storage is [Hazelcast 
IMap](https://docs.hazelcast.com/imdg/4.1/data-structures/map).
+The data saved in Hazelcast IMap will be distributed and stored in all nodes 
of the cluster. Hazelcast will partition the data stored in Imap. Each 
partition can specify the number of backups.
+Therefore, SeaTunnel Engine can achieve cluster HA without using other 
services(for example zookeeper).
+
+The `backup count` is to define the number of synchronous backups. For 
example, if it is set to 1, backup of a partition will be placed on one other 
member. If it is 2, it will be placed on two other members.
+
+We suggest the value of `backup-count` is the `min(1, max(5, N/2))`. `N` is 
the number of the cluster node.
+
+```
+seatunnel:
+    engine:
+        backup-count: 1
+        # other config
+```
+
+### 4.2 Slot service
+
+The number of Slots determines the number of TaskGroups the cluster node can 
run in parallel. SeaTunnel Engine is a data synchronization engine and most 
jobs are IO intensive.
+
+Dynamic Slot is suggest.
+
+```
+seatunnel:
+    engine:
+        slot-service:
+            dynamic-slot: true
+        # other config
+```
+
+### 4.3 Checkpoint Manager
+
+Like Flink, SeaTunnel Engine support Chandy–Lamport algorithm. Therefore, 
SeaTunnel Engine can realize data synchronization without data loss and 
duplication.
+
+**interval**
+
+The interval between two checkpoints, unit is milliseconds. If the 
`checkpoint.interval` parameter is configured in the `env` of the job config 
file, the value set here will be overwritten.
+
+**timeout**
+
+The timeout of a checkpoint. If a checkpoint cannot be completed within the 
timeout period, a checkpoint failure will be triggered. Therefore, Job will be 
restored.
+
+
+**max-concurrent**
+
+How many checkpoints can be performed simultaneously at most.
+
+**tolerable-failure**
+
+Maximum number of retries after checkpoint failure.
+
+Example
+
+```
+seatunnel:
+    engine:
+        backup-count: 1
+        print-execution-info-interval: 10
+        slot-service:
+            dynamic-slot: true
+        checkpoint:
+            interval: 300000
+            timeout: 10000
+            max-concurrent: 1
+            tolerable-failure: 2
+```
+
+**checkpoint storage**
+
+About the checkpoint storage, you can see [checkpoint 
storage](checkpoint-storage.md)
+
+## 5. Config SeaTunnel Engine Server
+
+All SeaTunnel Engine Server config in `hazelcast.yaml` file.
+
+### 5.1 cluster-name
+
+The SeaTunnel Engine nodes use the cluster name to determine whether the other 
is a cluster with themselves. If the cluster names between the two nodes are 
different, the SeaTunnel Engine will reject the service request.
+
+### 5.2 Network
+
+Base on 
[Hazelcast](https://docs.hazelcast.com/imdg/4.1/clusters/discovery-mechanisms), 
A SeaTunnel Engine cluster is a network of cluster members that run SeaTunnel 
Engine Server. Cluster members automatically join together to form a cluster. 
This automatic joining takes place with various discovery mechanisms that the 
cluster members use to find each other.
+
+Please note that, after a cluster is formed, communication between cluster 
members is always via TCP/IP, regardless of the discovery mechanism used.
+
+SeaTunnel Engine uses the following discovery mechanisms.
+
+#### TCP
+
+You can configure SeaTunnel Engine to be a full TCP/IP cluster. See the 
[Discovering Members by TCP section](tcp.md) for configuration details.
+
+An example is like this `hazelcast.yaml`
+
+```yaml
+hazelcast:
+  cluster-name: seatunnel
+  network:
+    join:
+      tcp-ip:
+        enabled: true
+        member-list:
+          - hostname1
+    port:
+      auto-increment: false
+      port: 5801
+  properties:
+    hazelcast.logging.type: log4j2
+```
+
+TCP is our suggest way in a standalone SeaTunnel Engine cluster.
+
+On the other hand, Hazelcast provides some other service discovery methods. 
For details, please refer to [hazelcast 
network](https://docs.hazelcast.com/imdg/4.1/clusters/setting-up-clusters)
+
+## 6. Config SeaTunnel Engine Client
+
+All SeaTunnel Engine Client config in `hazelcast-client.yaml`.
+
+### 6.1 cluster-name
+
+The Client must have the same `cluster-name` with the SeaTunnel Engine. 
Otherwise, SeaTunnel Engine will reject the client request.
+
+### 6.2 Network
+
+**cluster-members**
+
+All SeaTunnel Engine Server Node address need add to here.
+
+```yaml
+hazelcast-client:
+  cluster-name: seatunnel
+  properties:
+      hazelcast.logging.type: log4j2
+  network:
+    cluster-members:
+      - hostname1:5801
+```
+
+## 7. Start SeaTunnel Engine Server Node
+
+```shell
+mkdir -p $SEATUNNEL_HOME/logs
+nohup seatunnel-cluster.sh &
+```
+
+The logs will write in `$SEATUNNEL_HOME/logs/seatunnel-server.log`
+
+## 8. Install SeaTunnel Engine Client
+
+You only need to copy the `$SEATUNNEL_HOME` directory on the SeaTunnel Engine 
node to the Client node and config the `SEATUNNEL_HOME` like SeaTunnel Engine 
Server Node.
+
diff --git a/docs/en/seatunnel-engine/local-mode.md 
b/docs/en/seatunnel-engine/local-mode.md
index 46dfc2543..2126483fe 100644
--- a/docs/en/seatunnel-engine/local-mode.md
+++ b/docs/en/seatunnel-engine/local-mode.md
@@ -1,6 +1,9 @@
 ---
-sidebar_position: 3
+sidebar_position: 2
 ---
 
 # Run Job With Local Mode
 
+Only for test.
+
+The most recommended way to use SeaTunnel Engine in the production environment 
is [Cluster Mode](cluster-mode.md).
diff --git a/docs/en/seatunnel-engine/tcp.md b/docs/en/seatunnel-engine/tcp.md
new file mode 100644
index 000000000..7a8f67106
--- /dev/null
+++ b/docs/en/seatunnel-engine/tcp.md
@@ -0,0 +1,36 @@
+---
+sidebar_position: 6
+---
+
+# TCP NetWork
+
+If multicast is not the preferred way of discovery for your environment, then 
you can configure SeaTunnel Engine to be a full TCP/IP cluster. When you 
configure SeaTunnel Engine to discover members by TCP/IP, you must list all or 
a subset of the members' host names and/or IP addresses as cluster members. You 
do not have to list all of these cluster members, but at least one of the 
listed members has to be active in the cluster when a new member joins.
+
+To configure your Hazelcast to be a full TCP/IP cluster, set the following 
configuration elements. See the tcp-ip element section for the full 
descriptions of the TCP/IP discovery configuration elements.
+
+- Set the enabled attribute of the tcp-ip element to true.
+- Provide your member elements within the tcp-ip element.
+
+The following is an example declarative configuration.
+
+```yaml
+hazelcast:
+  network:
+    join:
+      tcp-ip:
+        enabled: true
+        member-list:
+          - machine1
+          - machine2
+          - machine3:5799
+          - 192.168.1.0-7
+          - 192.168.1.21
+```
+
+As shown above, you can provide IP addresses or host names for member 
elements. You can also give a range of IP addresses, such as `192.168.1.0-7`.
+
+Instead of providing members line-by-line as shown above, you also have the 
option to use the members element and write comma-separated IP addresses, as 
shown below.
+
+`<members>192.168.1.0-7,192.168.1.21</members>`
+
+If you do not provide ports for the members, Hazelcast automatically tries the 
ports `5701`, `5702` and so on.
\ No newline at end of file
diff --git a/docs/sidebars.js b/docs/sidebars.js
index 5eeaf74fa..3e2681e97 100644
--- a/docs/sidebars.js
+++ b/docs/sidebars.js
@@ -168,7 +168,9 @@ const sidebars = {
                 "seatunnel-engine/about",
                 "seatunnel-engine/deployment",
                 "seatunnel-engine/local-mode",
-                "seatunnel-engine/cluster-mode"
+                "seatunnel-engine/cluster-mode",
+                "seatunnel-engine/checkpoint-storage",
+                "seatunnel-engine/tcp"
             ]
         },
         {
diff --git a/plugins/README.md b/plugins/README.md
index 3cde4d385..2c27ae773 100644
--- a/plugins/README.md
+++ b/plugins/README.md
@@ -1,6 +1,9 @@
 # Introduction of plugins directory
-This directory used to store some plugin configuration files. 
 
-- `json/files/schemas/` is the default schema store directory for [Json 
transform 
plugin](https://seatunnel.apache.org/docs/transform/json#schema_dir-string).
+This directory used to store some third party jar package dependency by 
connector running, such as jdbc drivers.
 
-If you use spark cluster mode, this directory will be sent to the executor by 
`--files`.
\ No newline at end of file
+## directory structure
+
+The jar dependency  by connector need put in `plugins/${connector name}/lib/` 
dir.
+
+For example jdbc driver jars need put in 
`${seatunnel_install_home}/plugins/jdbc/lib/`
\ No newline at end of file
diff --git a/plugins/json/files/schemas/seatunnel-schema-demo.json 
b/plugins/json/files/schemas/seatunnel-schema-demo.json
deleted file mode 100644
index 408920f0e..000000000
--- a/plugins/json/files/schemas/seatunnel-schema-demo.json
+++ /dev/null
@@ -1,4 +0,0 @@
-{
-  "project":"seatunnel",
-  "group":"apache"
-}
\ No newline at end of file

[incubator-seatunnel] branch dev updated: [Docs] Improve document (#3768)

Reply via email to