Re: [PR] [Improve][Docs] Supplement Chinese documentation for SeaTunnel connectors [seatunnel]

via GitHub Mon, 24 Nov 2025 22:02:36 -0800


yzeng1618 commented on code in PR #10109:
URL: https://github.com/apache/seatunnel/pull/10109#discussion_r2558650098



##########
docs/zh/connector-v2/source/Iceberg.md:
##########
@@ -0,0 +1,230 @@
+import ChangeLog from '../changelog/connector-iceberg.md';
+
+# Apache Iceberg
+
+> Apache Iceberg 源连接器
+
+## 支持 Iceberg 版本
+
+- 1.6.1
+
+## 支持这些引擎
+
+> Spark<br/>
+> Flink<br/>
+> SeaTunnel Zeta<br/>
+
+## 关键特性
+
+- [x] [批](../../concept/connector-v2-features.md)
+- [x] [流](../../concept/connector-v2-features.md)
+- [x] [精确一次](../../concept/connector-v2-features.md)
+- [x] [列投影](../../concept/connector-v2-features.md)
+- [x] [并行性](../../concept/connector-v2-features.md)
+- [ ] [支持用户自定义split](../../concept/connector-v2-features.md)
+- [x] 数据格式
+  - [x] parquet
+  - [x] orc
+  - [x] avro
+- [x] iceberg 目录
+  - [x] hadoop(2.7.1 , 2.7.5 , 3.1.3)
+  - [x] hive(2.3.9 , 3.1.2)
+
+## 描述
+
+Apache Iceberg 的源连接器。它可以支持批处理和流模式。
+
+## 支持的数据源信息
+
+| 数据源 | 依赖 |                                   Maven                           
        |
+|--------|------|---------------------------------------------------------------------------|
+| Iceberg    | hive-exec | 
[下载](https://mvnrepository.com/artifact/org.apache.hive/hive-exec)  |
+| Iceberg    | libfb303  | 
[下载](https://mvnrepository.com/artifact/org.apache.thrift/libfb303) |
+
+## 数据库依赖
+
+> 为了与不同版本的 Hadoop 和 Hive 兼容，项目 pom 文件中 hive-exec 的范围是 provided，所以如果您使用 Flink 
引擎，首先您可能需要将以下 Jar 包添加到 <FLINK_HOME>/lib 目录，如果您使用 Spark 引擎并与 Hadoop 集成，则不需要添加以下 
Jar 包。如果您使用 hadoop s3 目录，您需要为您的 Flink 和 Spark 引擎版本添加 hadoop-aws、aws-java-sdk 
jars。（其他位置：<FLINK_HOME>/lib、<SPARK_HOME>/jars）
+
+```
+hive-exec-xxx.jar
+libfb303-xxx.jar
+```
+
+> hive-exec 包的某些版本没有 libfb303-xxx.jar，所以您还需要手动导入 Jar 包。
+
+## 数据类型映射
+
+| Iceberg 数据类型 | SeaTunnel 数据类型 |
+|-------------------|---------------------|
+| BOOLEAN           | BOOLEAN             |
+| INTEGER           | INT                 |
+| LONG              | BIGINT              |
+| FLOAT             | FLOAT               |
+| DOUBLE            | DOUBLE              |
+| DATE              | DATE                |
+| TIME              | TIME                |
+| TIMESTAMP         | TIMESTAMP           |
+| STRING            | STRING              |
+| FIXED<br/>BINARY  | BYTES               |
+| DECIMAL           | DECIMAL             |
+| STRUCT            | ROW                 |
+| LIST              | ARRAY               |
+| MAP               | MAP                 |
+
+## 源选项
+
+| 参数名                     | 类型    | 必须 | 默认值              | 描述                 
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
           |
+|--------------------------|---------|------|----------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| catalog_name             | string  | 是   | -                    | 用户指定的目录名称。 
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                          |
+| namespace                | string  | 是   | -                    | 后端目录中的 
iceberg 数据库名称。                                                                  
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
    |
+| table                    | string  | 否   | -                    | 后端目录中的 
iceberg 表名称。                                                                    
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
     |
+| table_list               | string  | 否   | -                    | 后端目录中的 
iceberg 表列表。                                                                    
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
     |
+| iceberg.catalog.config   | map     | 是   | -                    | 指定初始化 
Iceberg 
目录的属性，可以在此文件中引用：https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/CatalogProperties.java
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                             |
+| hadoop.config            | map     | 否   | -                    | 传递给 Hadoop 
配置的属性                                                                           
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                       |
+| iceberg.hadoop-conf-path | string  | 否   | -                    | 为 
'core-site.xml'、'hdfs-site.xml'、'hive-site.xml' 文件指定的加载路径。                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                     |
+| schema                   | config  | 否   | -                    | 
使用投影来选择数据列和列顺序。                                                                 
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                              |
+| case_sensitive           | boolean | 否   | false                | 如果通过 
schema [config] 选择了数据列，控制是否将与 schema 的匹配进行区分大小写。                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                       |
+| start_snapshot_timestamp | long    | 否   | -                    | 
指示此扫描从表的最新快照开始查找更改，从给定的时间戳开始。<br/>timestamp – 自 Unix 纪元以来的时间戳（毫秒）               
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
        |
+| start_snapshot_id        | long    | 否   | -                    | 
指示此扫描从特定快照（独占）开始查找更改。                                                           
                                                                                
                                                                                
                                                                                
                                                                                
                                    |
+| end_snapshot_id          | long    | 否   | -                    | 
指示此扫描查找更改直到特定快照（包含）。                                                            
                                                                                
                                                                                
                                                                                
                                                                                
                                           |
+| use_snapshot_id          | long    | 否   | -                    | 
指示此扫描使用给定的快照 ID。                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                             |
+| use_snapshot_timestamp   | long    | 否   | -                    | 
指示此扫描使用给定时间（毫秒）的最新快照。timestamp – 自 Unix 纪元以来的时间戳（毫秒）                            
                                                                                
                                                                                
                                                                                
                                                                                
|
+| stream_scan_strategy     | enum    | 否   | FROM_LATEST_SNAPSHOT | 
流模式执行的启动策略，如果不指定任何值，默认使用 
`FROM_LATEST_SNAPSHOT`，可选值为：<br/>TABLE_SCAN_THEN_INCREMENTAL：执行常规表扫描，然后切换到增量模式。<br/>FROM_LATEST_SNAPSHOT：从最新快照（包含）开始增量模式。<br/>FROM_EARLIEST_SNAPSHOT：从最早快照（包含）开始增量模式。<br/>FROM_SNAPSHOT_ID：从具有特定
 id（包含）的快照开始增量模式。<br/>FROM_SNAPSHOT_TIMESTAMP：从具有特定时间戳（包含）的快照开始增量模式。 |
+| increment.scan-interval  | long    | 否   | 2000                 | 
增量扫描的间隔（毫秒）                                                                     
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
             |
+| common-options           |         | 否   | -                    | 
源插件通用参数，请参考 [源通用选项](../source-common-options.md) 详见。                            
                                                                                
                                                                                
                                                                                
                                                                                
                                                                      |
+| query                    | String  | 否   | -                    | 用于选择 
iceberg 数据的 select DML。它不能包含表名，也不支持别名。例如：`select * from table where f1 > 
100`、`select fn from table where f1 > 100`。当前对 LIKE 语法的支持是有限的：LIKE 子句不应以 `%` 
开头。支持的是：`select f1 from t where f2 like 'tom%'  `                               
                                                                                
                                                                                
                                                        |
+
+
+## 任务示例
+
+### 简单
+
+```hocon
+env {
+  parallelism = 2
+  job.mode = "BATCH"
+}
+
+source {
+  Iceberg {
+    catalog_name = "seatunnel"
+    iceberg.catalog.config={
+      type = "hadoop"
+      warehouse = "file:///tmp/seatunnel/iceberg/hadoop/"
+    }
+    namespace = "database1"
+    table = "source"
+    query = "select fn from table where f1 > 100"
+    plugin_output = "iceberg"
+  }
+}
+
+transform {
+}
+
+sink {
+  Console {
+    plugin_input = "iceberg"
+  }
+}
+```
+
+### 多表读取
+
+```hocon
+source {
+  Iceberg {
+    catalog_name = "seatunnel"
+    iceberg.catalog.config = {
+      type = "hadoop"
+      warehouse = "file:///tmp/seatunnel/iceberg/hadoop/"
+    }
+    namespace = "database1"
+    table_list = [
+      {
+        table = "table_1

Review Comment:
   Thanks for the correction, merged the suggestion



##########
docs/zh/connector-v2/source/InfluxDB.md:
##########
@@ -0,0 +1,194 @@
+import ChangeLog from '../changelog/connector-influxdb.md';
+
+# InfluxDB
+
+> InfluxDB 源连接器
+
+## 描述
+
+通过 InfluxDB 读取外部数据源数据。
+
+## 关键特性
+
+- [x] [批](../../concept/connector-v2-features.md)
+- [ ] [流](../../concept/connector-v2-features.md)
+- [x] [精确一次](../../concept/connector-v2-features.md)
+- [x] [列投影](../../concept/connector-v2-features.md)
+
+支持查询 SQL 并可以实现投影效果。
+
+- [x] [并行性](../../concept/connector-v2-features.md)
+- [ ] [支持用户自定义 split](../../concept/connector-v2-features.md)
+
+## 选项
+
+| 参数名 | 类型 | 必须 | 默认值 | 描述 |
+|--------|------|------|--------|------|
+| url | string | 是 | - | InfluxDB 连接 URL |
+| sql | string | 是 | - | 用于搜索数据的查询 SQL |
+| schema | config | 是 | - | 上游数据的模式信息 |
+| database | string | 是 | - | InfluxDB 数据库 |
+| username | string | 否 | - | InfluxDB 用户名 |
+| password | string | 否 | - | InfluxDB 密码 |
+| lower_bound | long | 否 | - | split_column 的下界 |
+| upper_bound | long | 否 | - | split_column 的上界 |
+| partition_num | int | 否 | - | 分区数量 |
+| split_column | string | 否 | - | 分割列 |
+| epoch | string | 否 | n | 返回的时间精度 |
+| connect_timeout_ms | long | 否 | 15000 | 连接 InfluxDB 的超时时间（毫秒） |
+| query_timeout_sec | int | 否 | 3 | 查询超时时间（秒） |
+| common-options | config | 否 | - | 源插件通用参数 |
+
+### url
+
+连接到 InfluxDB 的 URL，例如：
+
+```
+http://influxdb-host:8086
+```
+
+### sql [string]
+
+用于搜索数据的查询 SQL
+
+```
+select name,age from test
+```
+
+### schema [config]
+
+#### fields [Config]
+
+上游数据的模式信息，例如：
+
+```
+schema {
+    fields {
+        name = string
+        age = int
+    }
+  }
+```
+
+### database [string]
+
+InfluxDB 数据库
+
+### username [string]
+
+InfluxDB 用户名
+
+### password [string]
+
+InfluxDB 密码
+
+### split_column [string]
+
+InfluxDB 的分割列
+
+> 提示：
+> - InfluxDB tags 不支持作为分割主键，因为 tags 的类型只能是字符串
+> - InfluxDB time 不支持作为分割主键，因为 time 字段无法参与数学计算
+> - 目前，`split_column` 仅支持整数数据分割，不支持 `float`、`string`、`date` 等类型。
+
+### upper_bound [long]
+
+`split_column` 列的上界
+
+### lower_bound [long]
+
+`split_column` 列的下界
+
+```
+     将 $split_column 范围分成 $partition_num 部分
+     如果 partition_num 为 1，使用整个 `split_column` 范围
+     如果 partition_num < (upper_bound - lower_bound)，使用 (upper_bound - 
lower_bound) 个分区
+     
+     例如：lower_bound = 1, upper_bound = 10, partition_num = 2
+     sql = "select * from test where age > 0 and age < 10"
+     
+     分割结果
+
+     分割 1: select * from test where ($split_column >= 1 and $split_column < 6) 
 and (  age > 0 and age < 10 )
+     
+     分割 2: select * from test where ($split_column >= 6 and $split_column < 
11) and (  age > 0 and age < 10 )
+
+```
+
+### partition_num [int]
+
+InfluxDB 的分区数量
+
+> 提示：确保 `upper_bound` 减去 `lower_bound` 能被 `partition_num` 整除，否则查询结果会重叠
+
+### epoch [string]
+
+返回的时间精度
+- 可选值：H, m, s, MS, u, n
+- 默认值：n
+
+### query_timeout_sec [int]
+
+InfluxDB 的查询超时时间（秒）
+
+### connect_timeout_ms [long]
+
+连接到 InfluxDB 的超时时间（毫秒）
+
+### 通用选项
+
+源插件通用参数，请参考 [源通用选项](../source-common-options.md) 详见。
+
+## 示例
+
+多并行性和多分区扫描示例
+
+```hocon
+source {
+
+    InfluxDB {
+        url = "http://influxdb-host:8086";
+        sql = "select label, value, rt, time from test"
+        database = "test"
+        upper_bound = 100
+        lower_bound = 1
+        partition_num = 4
+        split_column = "value"
+        schema {
+            fields {
+                label = STRING
+                value = INT
+                rt = STRING
+                time = BIGINT
+            }
+    }
+
+}
+
+```
+
+不使用分区扫描的示例
+
+```hocon
+source {
+
+    InfluxDB {
+        url = "http://influxdb-host:8086";
+        sql = "select label, value, rt, time from test"
+        database = "test"
+        schema {
+            fields {
+                label = STRING
+                value = INT
+                rt = STRING
+                time = BIGINT
+            }
+    }

Review Comment:
   Thanks for the correction, merged the suggestion



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [Improve][Docs] Supplement Chinese documentation for SeaTunnel connectors [seatunnel]

Reply via email to