This is an automated email from the ASF dual-hosted git repository.
liugddx pushed a commit to branch dev
in repository https://gitbox.apache.org/repos/asf/incubator-seatunnel.git
The following commit(s) were added to refs/heads/dev by this push:
new da6cecd27 [Engine][Docs]Checkpoint docs (#3695)
da6cecd27 is described below
commit da6cecd27d50b1015510de63bdbd84c804f620c1
Author: Kirs <[email protected]>
AuthorDate: Mon Dec 12 12:14:59 2022 +0800
[Engine][Docs]Checkpoint docs (#3695)
---
docs/en/seatunnel-engine/checkpoint-storage.md | 121 +++++++++++++++++++++++++
1 file changed, 121 insertions(+)
diff --git a/docs/en/seatunnel-engine/checkpoint-storage.md
b/docs/en/seatunnel-engine/checkpoint-storage.md
new file mode 100644
index 000000000..537df44a2
--- /dev/null
+++ b/docs/en/seatunnel-engine/checkpoint-storage.md
@@ -0,0 +1,121 @@
+# Checkpoint Storage
+## Introduction
+Checkpoint is a fault-tolerant recovery mechanism. This mechanism ensures that
when the program is running, it can recover itself even if it suddenly
encounters an exception.
+
+### Checkpoint Storage
+Checkpoint Storage is a storage mechanism for storing checkpoint data.
+
+SeaTunnel Engine supports the following checkpoint storage types:
+
+- HDFS (S3,HDFS,LocalFile)
+- LocalFile (native), (it's deprecated: use Hdfs(LocalFile) instead.
+
+We used the microkernel design pattern to separate the checkpoint storage
module from the engine. This allows users to implement their own checkpoint
storage modules.
+
+`checkpoint-storage-api` is the checkpoint storage module API, which defines
the interface of the checkpoint storage module.
+
+if you want to implement your own checkpoint storage module, you need to
implement the `CheckpointStorage` and provide the corresponding
`CheckpointStorageFactory` implementation.
+
+
+### Checkpoint Storage Configuration
+The configuration of the `seatunnel-server` module is in the `seatunnel.yaml`
file.
+```yaml
+
+seatunnel:
+ engine:
+ checkpoint:
+ storage:
+ type: hdfs #plugin name of checkpoint storage, we support
hdfs(S3, local, hdfs), localfile (native local file) is the default, but this
plugin is de
+ # plugin configuration
+ plugin-config:
+ storageNameSpace: #checkpoint storage parent path, the
default value is /seatunnel/checkpoint
+ K1: V1 # plugin other configuration
+ K2: V2 # plugin other configuration
+```
+#### S3
+S3 base on hdfs-file, so you can refer [hadoop
docs](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html0)
to config s3.
+
+Except when interacting with public S3 buckets, the S3A client needs the
credentials needed to interact with buckets.
+The client supports multiple authentication mechanisms and can be configured
as to which mechanisms to use, and their order of use. Custom implementations
of com.amazonaws.auth.AWSCredentialsProvider may also be used.
+if you used SimpleAWSCredentialsProvider (can be obtained from the Amazon
Security Token Service), these consist of an access key, a secret key.
+you can config like this:
+
+```yaml
+``` yaml
+
+seatunnel:
+ engine:
+ checkpoint:
+ interval: 6000
+ timeout: 7000
+ max-concurrent: 5
+ tolerable-failure: 2
+ storage:
+ type: hdfs
+ max-retained: 3
+ plugin-config:
+ storage-type: s3
+ s3.bucket: your-bucket
+ fs.s3a.access-key: your-access-key
+ fs.s3a.secret-key: your-secret-key
+ fs.s3a.aws.credentials.provider:
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
+
+
+```
+if you used `InstanceProfileCredentialsProvider`, this supports use of
instance profile credentials if running in an EC2 VM, you could check
[iam-roles-for-amazon-ec2](https://docs.aws.amazon.com/zh_cn/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html).
+you can config like this:
+```yaml
+
+seatunnel:
+ engine:
+ checkpoint:
+ interval: 6000
+ timeout: 7000
+ max-concurrent: 5
+ tolerable-failure: 2
+ storage:
+ type: hdfs
+ max-retained: 3
+ plugin-config:
+ storage-type: s3
+ s3.bucket: your-bucket
+ fs.s3a.endpoint: your-endpoint
+ fs.s3a.aws.credentials.provider:
org.apache.hadoop.fs.s3a.InstanceProfileCredentialsProvider
+```
+
+For additional reading on the Hadoop Credential Provider API see: [Credential
Provider
API](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html).
+#### HDFS
+if you used HDFS, you can config like this:
+```yaml
+seatunnel:
+ engine:
+ checkpoint:
+ storage:
+ type: hdfs
+ max-retained: 3
+ plugin-config:
+ storage-type: hdfs
+ fs.defaultFS: hdfs://localhost:9000
+ // if you used kerberos, you can config like this:
+ kerberosPrincipal: your-kerberos-principal
+ kerberosKeytab: your-kerberos-keytab
+```
+
+
+#### LocalFile
+```yaml
+seatunnel:
+ engine:
+ checkpoint:
+ interval: 6000
+ timeout: 7000
+ max-concurrent: 5
+ tolerable-failure: 2
+ storage:
+ type: hdfs
+ max-retained: 3
+ plugin-config:
+ storage-type: hdfs
+ fs.defaultFS: /tmp/ # Ensure that the directory has written
permission
+
+```