This is an automated email from the ASF dual-hosted git repository.
wanghailin pushed a commit to branch dev
in repository https://gitbox.apache.org/repos/asf/seatunnel.git
The following commit(s) were added to refs/heads/dev by this push:
new c09138ba96 [Feature][Zeta] Add COS support for checkpoint storage
(#7931)
c09138ba96 is described below
commit c09138ba9696584f38a48b15b4fc2d2e43b14183
Author: limin <[email protected]>
AuthorDate: Wed Oct 30 12:54:59 2024 +0800
[Feature][Zeta] Add COS support for checkpoint storage (#7931)
---
docs/en/seatunnel-engine/checkpoint-storage.md | 38 +++++++++++++++++++++-
docs/zh/seatunnel-engine/checkpoint-storage.md | 38 +++++++++++++++++++++-
.../checkpoint-storage-hdfs/pom.xml | 6 ++++
...ileConfiguration.java => CosConfiguration.java} | 35 +++++++++-----------
.../storage/hdfs/common/FileConfiguration.java | 3 +-
5 files changed, 98 insertions(+), 22 deletions(-)
diff --git a/docs/en/seatunnel-engine/checkpoint-storage.md
b/docs/en/seatunnel-engine/checkpoint-storage.md
index 7027f8067f..19c617e015 100644
--- a/docs/en/seatunnel-engine/checkpoint-storage.md
+++ b/docs/en/seatunnel-engine/checkpoint-storage.md
@@ -14,7 +14,7 @@ Checkpoint Storage is a storage mechanism for storing
checkpoint data.
SeaTunnel Engine supports the following checkpoint storage types:
-- HDFS (OSS,S3,HDFS,LocalFile)
+- HDFS (OSS,COS,S3,HDFS,LocalFile)
- LocalFile (native), (it's deprecated: use Hdfs(LocalFile) instead.
We use the microkernel design pattern to separate the checkpoint storage
module from the engine. This allows users to implement their own checkpoint
storage modules.
@@ -73,6 +73,42 @@ For additional reading on the Hadoop Credential Provider
API, you can see: [Cred
For Aliyun OSS Credential Provider implements, you can see: [Auth Credential
Providers](https://github.com/aliyun/aliyun-oss-java-sdk/tree/master/src/main/java/com/aliyun/oss/common/auth)
+#### COS
+
+Tencent COS based hdfs-file you can refer [Hadoop COS
Docs](https://hadoop.apache.org/docs/stable/hadoop-cos/cloud-storage/) to
config COS.
+
+Except when interacting with cos buckets, the cos client needs the credentials
needed to interact with buckets.
+The client supports multiple authentication mechanisms and can be configured
as to which mechanisms to use, and their order of use. Custom implementations
of com.qcloud.cos.auth.COSCredentialsProvider may also be used.
+If you used SimpleCredentialsProvider (can be obtained from the Tencent Cloud
API Key Management), these consist of an access key, a secret key.
+You can config like this:
+
+```yaml
+seatunnel:
+ engine:
+ checkpoint:
+ interval: 6000
+ timeout: 7000
+ storage:
+ type: hdfs
+ max-retained: 3
+ plugin-config:
+ storage.type: cos
+ cos.bucket: cosn://your-bucket
+ fs.cosn.credentials.provider:
org.apache.hadoop.fs.cosn.auth.SimpleCredentialsProvider
+ fs.cosn.userinfo.secretId: your-secretId
+ fs.cosn.userinfo.secretKey: your-secretKey
+ fs.cosn.bucket.region: your-region
+```
+
+For additional reading on the Hadoop Credential Provider API, you can see:
[Credential Provider
API](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html).
+
+For additional COS configuration, you can see: [Tencent Hadoop-COS
Docs](https://doc.fincloud.tencent.cn/tcloud/Storage/COS/846365/hadoop)
+
+Please add the following jar to the lib directory:
+-
[hadoop-cos-3.4.1.jar](https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-cos/3.4.1)
+-
[cos_api-bundle-5.6.69.jar](https://mvnrepository.com/artifact/com.qcloud/cos_api-bundle/5.6.69)
+-
[hadoop-shaded-guava-1.1.1.jar](https://mvnrepository.com/artifact/org.apache.hadoop.thirdparty/hadoop-shaded-guava/1.1.1)
+
#### S3
S3 based hdfs-file you can refer [hadoop s3
docs](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html)
to config s3.
diff --git a/docs/zh/seatunnel-engine/checkpoint-storage.md
b/docs/zh/seatunnel-engine/checkpoint-storage.md
index 86165d5d3b..a60fdff5ae 100644
--- a/docs/zh/seatunnel-engine/checkpoint-storage.md
+++ b/docs/zh/seatunnel-engine/checkpoint-storage.md
@@ -12,7 +12,7 @@ sidebar_position: 7
SeaTunnel Engine支持以下检查点存储类型:
-- HDFS (OSS,S3,HDFS,LocalFile)
+- HDFS (OSS,COS,S3,HDFS,LocalFile)
- LocalFile (本地),(已弃用: 使用HDFS(LocalFile)替代).
我们使用微内核设计模式将检查点存储模块从引擎中分离出来。这允许用户实现他们自己的检查点存储模块。
@@ -71,6 +71,42 @@ seatunnel:
阿里云OSS凭证提供程序实现见:
[验证凭证提供](https://github.com/aliyun/aliyun-oss-java-sdk/tree/master/src/main/java/com/aliyun/oss/common/auth)
+#### COS
+
+腾讯云COS基于hdfs-file,所以你可以参考[Hadoop
COS文档](https://hadoop.apache.org/docs/stable/hadoop-cos/cloud-storage/)来配置COS.
+
+除了与公共COS buckets交互之外,COS客户端需要与buckets交互所需的凭据。
+客户端支持多种身份验证机制,并且可以配置使用哪种机制及其使用顺序。也可以使用com.qcloud.cos.auth.COSCredentialsProvider的自定义实现。
+如果您使用SimpleCredentialsProvider(可以从腾讯云API密钥管理中获得),它们包括一个secretId和一个secretKey。
+您可以这样配置:
+
+```yaml
+seatunnel:
+ engine:
+ checkpoint:
+ interval: 6000
+ timeout: 7000
+ storage:
+ type: hdfs
+ max-retained: 3
+ plugin-config:
+ storage.type: cos
+ cos.bucket: cosn://your-bucket
+ fs.cosn.credentials.provider:
org.apache.hadoop.fs.cosn.auth.SimpleCredentialsProvider
+ fs.cosn.userinfo.secretId: your-secretId
+ fs.cosn.userinfo.secretKey: your-secretKey
+ fs.cosn.bucket.region: your-region
+```
+
+有关Hadoop Credential Provider API的更多信息,请参见: [Credential Provider
API](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html).
+
+腾讯云COS相关配置可参考:[Tencent
Hadoop-COS文档](https://doc.fincloud.tencent.cn/tcloud/Storage/COS/846365/hadoop)
+
+使用前请将如下jar添加到lib目录下:
+-
[hadoop-cos-3.4.1.jar](https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-cos/3.4.1)
+-
[cos_api-bundle-5.6.69.jar](https://mvnrepository.com/artifact/com.qcloud/cos_api-bundle/5.6.69)
+-
[hadoop-shaded-guava-1.1.1.jar](https://mvnrepository.com/artifact/org.apache.hadoop.thirdparty/hadoop-shaded-guava/1.1.1)
+
#### S3
S3基于hdfs-file,所以你可以参考[Hadoop
s3文档](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html)来配置s3。
diff --git
a/seatunnel-engine/seatunnel-engine-storage/checkpoint-storage-plugins/checkpoint-storage-hdfs/pom.xml
b/seatunnel-engine/seatunnel-engine-storage/checkpoint-storage-plugins/checkpoint-storage-hdfs/pom.xml
index f7107f9f32..8ae75cddd5 100644
---
a/seatunnel-engine/seatunnel-engine-storage/checkpoint-storage-plugins/checkpoint-storage-hdfs/pom.xml
+++
b/seatunnel-engine/seatunnel-engine-storage/checkpoint-storage-plugins/checkpoint-storage-hdfs/pom.xml
@@ -65,6 +65,12 @@
<version>1.11.271</version>
<scope>provided</scope>
</dependency>
+ <dependency>
+ <groupId>org.apache.hadoop</groupId>
+ <artifactId>hadoop-cos</artifactId>
+ <version>3.4.1</version>
+ <scope>provided</scope>
+ </dependency>
</dependencies>
</project>
diff --git
a/seatunnel-engine/seatunnel-engine-storage/checkpoint-storage-plugins/checkpoint-storage-hdfs/src/main/java/org/apache/seatunnel/engine/checkpoint/storage/hdfs/common/FileConfiguration.java
b/seatunnel-engine/seatunnel-engine-storage/checkpoint-storage-plugins/checkpoint-storage-hdfs/src/main/java/org/apache/seatunnel/engine/checkpoint/storage/hdfs/common/CosConfiguration.java
similarity index 50%
copy from
seatunnel-engine/seatunnel-engine-storage/checkpoint-storage-plugins/checkpoint-storage-hdfs/src/main/java/org/apache/seatunnel/engine/checkpoint/storage/hdfs/common/FileConfiguration.java
copy to
seatunnel-engine/seatunnel-engine-storage/checkpoint-storage-plugins/checkpoint-storage-hdfs/src/main/java/org/apache/seatunnel/engine/checkpoint/storage/hdfs/common/CosConfiguration.java
index a9b30346ed..56fdc74362 100644
---
a/seatunnel-engine/seatunnel-engine-storage/checkpoint-storage-plugins/checkpoint-storage-hdfs/src/main/java/org/apache/seatunnel/engine/checkpoint/storage/hdfs/common/FileConfiguration.java
+++
b/seatunnel-engine/seatunnel-engine-storage/checkpoint-storage-plugins/checkpoint-storage-hdfs/src/main/java/org/apache/seatunnel/engine/checkpoint/storage/hdfs/common/CosConfiguration.java
@@ -20,28 +20,25 @@
package org.apache.seatunnel.engine.checkpoint.storage.hdfs.common;
-public enum FileConfiguration {
- LOCAL("local", new LocalConfiguration()),
- HDFS("hdfs", new HdfsConfiguration()),
- S3("s3", new S3Configuration()),
- OSS("oss", new OssConfiguration());
+import org.apache.hadoop.conf.Configuration;
- /** file system type */
- private final String name;
+import java.util.Map;
- /** file system configuration */
- private final AbstractConfiguration configuration;
+import static org.apache.hadoop.fs.FileSystem.FS_DEFAULT_NAME_KEY;
- FileConfiguration(String name, AbstractConfiguration configuration) {
- this.name = name;
- this.configuration = configuration;
- }
-
- public AbstractConfiguration getConfiguration(String name) {
- return configuration;
- }
+public class CosConfiguration extends AbstractConfiguration {
+ public static final String COS_BUCKET_KEY = "cos.bucket";
+ private static final String COS_IMPL_KEY = "fs.cosn.impl";
+ private static final String HDFS_COS_IMPL =
"org.apache.hadoop.fs.cosn.CosNFileSystem";
+ private static final String COS_KEY = "fs.cosn.";
- public String getName() {
- return name;
+ @Override
+ public Configuration buildConfiguration(Map<String, String> config) {
+ checkConfiguration(config, COS_BUCKET_KEY);
+ Configuration hadoopConf = new Configuration();
+ hadoopConf.set(FS_DEFAULT_NAME_KEY, config.get(COS_BUCKET_KEY));
+ hadoopConf.set(COS_IMPL_KEY, HDFS_COS_IMPL);
+ setExtraConfiguration(hadoopConf, config, COS_KEY);
+ return hadoopConf;
}
}
diff --git
a/seatunnel-engine/seatunnel-engine-storage/checkpoint-storage-plugins/checkpoint-storage-hdfs/src/main/java/org/apache/seatunnel/engine/checkpoint/storage/hdfs/common/FileConfiguration.java
b/seatunnel-engine/seatunnel-engine-storage/checkpoint-storage-plugins/checkpoint-storage-hdfs/src/main/java/org/apache/seatunnel/engine/checkpoint/storage/hdfs/common/FileConfiguration.java
index a9b30346ed..a1904e5fcb 100644
---
a/seatunnel-engine/seatunnel-engine-storage/checkpoint-storage-plugins/checkpoint-storage-hdfs/src/main/java/org/apache/seatunnel/engine/checkpoint/storage/hdfs/common/FileConfiguration.java
+++
b/seatunnel-engine/seatunnel-engine-storage/checkpoint-storage-plugins/checkpoint-storage-hdfs/src/main/java/org/apache/seatunnel/engine/checkpoint/storage/hdfs/common/FileConfiguration.java
@@ -24,7 +24,8 @@ public enum FileConfiguration {
LOCAL("local", new LocalConfiguration()),
HDFS("hdfs", new HdfsConfiguration()),
S3("s3", new S3Configuration()),
- OSS("oss", new OssConfiguration());
+ OSS("oss", new OssConfiguration()),
+ COS("cos", new CosConfiguration());
/** file system type */
private final String name;