This is an automated email from the ASF dual-hosted git repository.
gaojun2048 pushed a commit to branch dev
in repository https://gitbox.apache.org/repos/asf/incubator-seatunnel.git
The following commit(s) were added to refs/heads/dev by this push:
new 07da889d2 [Doc][Improve]add transform v2 doc & remove transform v1 doc
(#3786)
07da889d2 is described below
commit 07da889d287566f82a9505b6fa4772950c5aae61
Author: Eric <[email protected]>
AuthorDate: Fri Dec 30 14:25:56 2022 +0800
[Doc][Improve]add transform v2 doc & remove transform v1 doc (#3786)
---
README.md | 2 +-
docs/en/about.md | 4 +-
docs/en/concept/config.md | 52 +++++-
docs/en/connector-v2/sink/Console.md | 6 -
docs/en/connector-v2/sink/Hive.md | 3 -
docs/en/connector-v2/sink/Socket.md | 4 -
docs/en/connector-v2/sink/common-options.md | 19 +-
docs/en/connector-v2/source/Socket.md | 3 -
docs/en/connector-v2/source/common-options.md | 2 +-
docs/en/seatunnel-engine/deployment.md | 2 +-
docs/en/start-v2/locally/quick-start-flink.md | 4 -
.../locally/quick-start-seatunnel-engine.md | 6 +-
docs/en/start-v2/locally/quick-start-spark.md | 4 -
docs/en/transform-v2/common-options.md | 23 +++
docs/en/transform-v2/copy.md | 66 +++++++
docs/en/transform-v2/filter-rowkind.md | 67 +++++++
docs/en/transform-v2/filter.md | 60 +++++++
docs/en/transform-v2/replace.md | 121 +++++++++++++
docs/en/transform-v2/split.md | 72 ++++++++
docs/en/transform/common-options.mdx | 118 ------------
docs/en/transform/json.md | 197 ---------------------
docs/en/transform/nullRate.md | 69 --------
docs/en/transform/nulltf.md | 75 --------
docs/en/transform/replace.md | 81 ---------
docs/en/transform/split.mdx | 124 -------------
docs/en/transform/sql.md | 61 -------
docs/en/transform/udf.md | 44 -----
docs/en/transform/uuid.md | 64 -------
docs/sidebars.js | 12 +-
seatunnel-connectors-v2/README.md | 14 +-
.../engine/e2e/ClusterFaultToleranceIT.java | 8 +-
.../e2e/ClusterFaultToleranceTwoPipelineIT.java | 6 +-
.../seatunnel/transform/FilterFieldTransform.java | 2 +-
33 files changed, 492 insertions(+), 903 deletions(-)
diff --git a/README.md b/README.md
index 8af46a85e..84feaabb2 100644
--- a/README.md
+++ b/README.md
@@ -59,7 +59,7 @@ The default engine use by SeaTunnel is [SeaTunnel
Engine](seatunnel-engine/READM
- Sink Connectors supported [check
out](https://seatunnel.apache.org/docs/category/sink-v2)
-- Transform supported [check
out](https://seatunnel.apache.org/docs/transform/common-options/)
+- Transform supported [check out](docs/en/transform-v2)
### Here's a list of our connectors with their health status.[connector
status](docs/en/Connector-v2-release-state.md)
diff --git a/docs/en/about.md b/docs/en/about.md
index 6a530df38..aba24a21c 100644
--- a/docs/en/about.md
+++ b/docs/en/about.md
@@ -24,7 +24,7 @@ SeaTunnel focuses on data integration and data
synchronization, and is mainly de
## Features of SeaTunnel
- Rich and extensible Connector: SeaTunnel provides a Connector API that does
not depend on a specific execution engine. Connectors (Source, Transform, Sink)
developed based on this API can run On many different engines, such as
SeaTunnel Engine, Flink, Spark that are currently supported.
-- Connector plug-in: The plug-in design allows users to easily develop their
own Connector and integrate it into the SeaTunnel project. Currently, SeaTunnel
has supported more than 70 Connectors, and the number is surging. There is the
list of the [currently-supported connectors](Connector-v2-release-state.md)
+- Connector plug-in: The plug-in design allows users to easily develop their
own Connector and integrate it into the SeaTunnel project. Currently, SeaTunnel
has supported more than 100 Connectors, and the number is surging. There is the
list of the [currently-supported connectors](Connector-v2-release-state.md)
- Batch-stream integration: Connectors developed based on SeaTunnel Connector
API are perfectly compatible with offline synchronization, real-time
synchronization, full- synchronization, incremental synchronization and other
scenarios. It greatly reduces the difficulty of managing data integration tasks.
- Support distributed snapshot algorithm to ensure data consistency.
- Multi-engine support: SeaTunnel uses SeaTunnel Engine for data
synchronization by default. At the same time, SeaTunnel also supports the use
of Flink or Spark as the execution engine of the Connector to adapt to the
existing technical components of the enterprise. SeaTunnel supports multiple
versions of Spark and Flink.
@@ -51,7 +51,7 @@ The default engine use by SeaTunnel is [SeaTunnel
Engine](seatunnel-engine/about
- **Source Connectors** SeaTunnel support read data from various relational
databases, graph databases, NoSQL databases, document databases, and memory
databases. Various distributed file systems such as HDFS. A variety of cloud
storage, such as S3 and OSS. At the same time, we also support data reading of
many common SaaS services. You can access the detailed list
[here](connector-v2/source). If you want, You can develop your own source
connector and easily integrate it into seatunnel.
-- **Transform Connector**
+- **Transform Connector** If the schema is different between source and sink,
You can use Transform Connector to change the schema read from source and make
it same as the sink schema.
- **Sink Connector** SeaTunnel support write data to various relational
databases, graph databases, NoSQL databases, document databases, and memory
databases. Various distributed file systems such as HDFS. A variety of cloud
storage, such as S3 and OSS. At the same time, we also support write data to
many common SaaS services. You can access the detailed list
[here](connector-v2/sink). If you want, You can develop your own sink connector
and easily integrate it into seatunnel.
diff --git a/docs/en/concept/config.md b/docs/en/concept/config.md
index cc8f7a5db..ba49606e6 100644
--- a/docs/en/concept/config.md
+++ b/docs/en/concept/config.md
@@ -20,19 +20,28 @@ The Config file will be similar to the one below.
```hocon
env {
- execution.parallelism = 1
+ job.mode = "BATCH"
}
source {
FakeSource {
result_table_name = "fake"
- field_name = "name,age"
+ row.num = 100
+ schema = {
+ fields {
+ name = "string"
+ age = "int"
+ card = "int"
+ }
+ }
}
}
transform {
- sql {
- sql = "select name,age from fake"
+ Filter {
+ source_table_name = "fake"
+ result_table_name = "fake1"
+ fields = [name, card]
}
}
@@ -41,9 +50,10 @@ sink {
host = "clickhouse:8123"
database = "default"
table = "seatunnel_console"
- fields = ["name"]
+ fields = ["name", "card"]
username = "default"
password = ""
+ source_table_name = "fake1"
}
}
```
@@ -74,13 +84,39 @@ course, this uses the word 'may', which means that we can
also directly treat th
directly from source to sink. Like below.
```hocon
-transform {
- // no thing on here
+env {
+ job.mode = "BATCH"
+}
+
+source {
+ FakeSource {
+ result_table_name = "fake"
+ row.num = 100
+ schema = {
+ fields {
+ name = "string"
+ age = "int"
+ card = "int"
+ }
+ }
+ }
+}
+
+sink {
+ Clickhouse {
+ host = "clickhouse:8123"
+ database = "default"
+ table = "seatunnel_console"
+ fields = ["name", "age", card"]
+ username = "default"
+ password = ""
+ source_table_name = "fake1"
+ }
}
```
Like source, transform has specific parameters that belong to each module. The
supported source at now check.
-The supported transform at now check [Transform of SeaTunnel](../transform)
+The supported transform at now check [Transform V2 of
SeaTunnel](../transform-v2)
### sink
diff --git a/docs/en/connector-v2/sink/Console.md
b/docs/en/connector-v2/sink/Console.md
index 743246b37..134b74c85 100644
--- a/docs/en/connector-v2/sink/Console.md
+++ b/docs/en/connector-v2/sink/Console.md
@@ -54,12 +54,6 @@ source {
}
}
-transform {
- sql {
- sql = "select name, age from fake"
- }
-}
-
sink {
Console {
diff --git a/docs/en/connector-v2/sink/Hive.md
b/docs/en/connector-v2/sink/Hive.md
index 607a20a0a..63ba16804 100644
--- a/docs/en/connector-v2/sink/Hive.md
+++ b/docs/en/connector-v2/sink/Hive.md
@@ -122,9 +122,6 @@ source {
}
}
-transform {
-}
-
sink {
# choose stdout output plugin to output data to console
diff --git a/docs/en/connector-v2/sink/Socket.md
b/docs/en/connector-v2/sink/Socket.md
index 2a29a17c4..a1ab7b440 100644
--- a/docs/en/connector-v2/sink/Socket.md
+++ b/docs/en/connector-v2/sink/Socket.md
@@ -69,10 +69,6 @@ source {
}
}
-transform {
- sql = "select name, age from fake"
-}
-
sink {
Socket {
host = "localhost"
diff --git a/docs/en/connector-v2/sink/common-options.md
b/docs/en/connector-v2/sink/common-options.md
index 53c623086..2f6b83471 100644
--- a/docs/en/connector-v2/sink/common-options.md
+++ b/docs/en/connector-v2/sink/common-options.md
@@ -1,4 +1,4 @@
-# Common Options
+# Sink Common Options
> Common parameters of sink connectors
@@ -32,24 +32,27 @@ source {
}
transform {
- sql {
+ Filter {
source_table_name = "fake"
- sql = "select name from fake"
+ fields = [name]
result_table_name = "fake_name"
}
- sql {
+ Filter {
source_table_name = "fake"
- sql = "select age from fake"
+ fields = [age]
result_table_name = "fake_age"
}
}
sink {
- console {
- parallelism = 3
+ Console {
source_table_name = "fake_name"
}
+ Console {
+ source_table_name = "fake_age"
+ }
}
```
-> If `source_table_name` is not specified, the console outputs the data of the
last transform, and if it is set to `fake_name` , it will output the data of
`fake_name`
+> If the job only have one source and one(or zero) transform and one sink, You
do not need to specify `source_table_name` and `result_table_name` for
connector.
+> If the number of any operator in source, transform and sink is greater than
1, you must specify the `source_table_name` and `result_table_name` for each
connector in the job.
diff --git a/docs/en/connector-v2/source/Socket.md
b/docs/en/connector-v2/source/Socket.md
index 55c3c200f..eccc1f4fe 100644
--- a/docs/en/connector-v2/source/Socket.md
+++ b/docs/en/connector-v2/source/Socket.md
@@ -62,9 +62,6 @@ source {
}
}
-transform {
-}
-
sink {
Console {}
}
diff --git a/docs/en/connector-v2/source/common-options.md
b/docs/en/connector-v2/source/common-options.md
index 7fc32c505..e1072e680 100644
--- a/docs/en/connector-v2/source/common-options.md
+++ b/docs/en/connector-v2/source/common-options.md
@@ -1,4 +1,4 @@
-# Common Options
+# Source Common Options
> Common parameters of source connectors
diff --git a/docs/en/seatunnel-engine/deployment.md
b/docs/en/seatunnel-engine/deployment.md
index 7deb082a8..1e37dfd02 100644
--- a/docs/en/seatunnel-engine/deployment.md
+++ b/docs/en/seatunnel-engine/deployment.md
@@ -175,7 +175,7 @@ mkdir -p $SEATUNNEL_HOME/logs
nohup seatunnel-cluster.sh &
```
-The logs will write in `$SEATUNNEL_HOME/logs/seatunnel-server.log`
+The logs will write in `$SEATUNNEL_HOME/logs/seatunnel-engine-server.log`
## 8. Install SeaTunnel Engine Client
diff --git a/docs/en/start-v2/locally/quick-start-flink.md
b/docs/en/start-v2/locally/quick-start-flink.md
index f7f58a8d1..3cc78a8db 100644
--- a/docs/en/start-v2/locally/quick-start-flink.md
+++ b/docs/en/start-v2/locally/quick-start-flink.md
@@ -40,10 +40,6 @@ source {
}
}
-transform {
-
-}
-
sink {
Console {}
}
diff --git a/docs/en/start-v2/locally/quick-start-seatunnel-engine.md
b/docs/en/start-v2/locally/quick-start-seatunnel-engine.md
index d2ce05b37..2e9d86b92 100644
--- a/docs/en/start-v2/locally/quick-start-seatunnel-engine.md
+++ b/docs/en/start-v2/locally/quick-start-seatunnel-engine.md
@@ -32,10 +32,6 @@ source {
}
}
-transform {
-
-}
-
sink {
Console {}
}
@@ -82,7 +78,7 @@ row=16 : SGZCr, 94186144
## What's More
-For now, you are already take a quick look about SeaTunnel, you could see
[connector](/docs/category/connector-v2) to find all
+For now, you are already take a quick look about SeaTunnel, you could see
[connector](../../connector-v2/source/FakeSource.md) to find all
source and sink SeaTunnel supported. Or see [SeaTunnel
Engine](../../seatunnel-engine/about.md) if you want to know more about
SeaTunnel Engine.
SeaTunnel also supports running jobs in Spark/Flink. You can see [Quick Start
With Spark](quick-start-spark.md) or [Quick Start With
Flink](quick-start-flink.md).
diff --git a/docs/en/start-v2/locally/quick-start-spark.md
b/docs/en/start-v2/locally/quick-start-spark.md
index a66c27712..576b284a9 100644
--- a/docs/en/start-v2/locally/quick-start-spark.md
+++ b/docs/en/start-v2/locally/quick-start-spark.md
@@ -41,10 +41,6 @@ source {
}
}
-transform {
-
-}
-
sink {
Console {}
}
diff --git a/docs/en/transform-v2/common-options.md
b/docs/en/transform-v2/common-options.md
new file mode 100644
index 000000000..c858245fc
--- /dev/null
+++ b/docs/en/transform-v2/common-options.md
@@ -0,0 +1,23 @@
+# Transform Common Options
+
+> Common parameters of source connectors
+
+| name | type | required | default value |
+|-------------------| ------ | -------- | ------------- |
+| result_table_name | string | no | - |
+| source_table_name | string | no | - |
+
+### source_table_name [string]
+
+When `source_table_name` is not specified, the current plug-in processes the
data set `(dataset)` output by the previous plug-in in the configuration file;
+
+When `source_table_name` is specified, the current plugin is processing the
data set corresponding to this parameter.
+
+### result_table_name [string]
+
+When `result_table_name` is not specified, the data processed by this plugin
will not be registered as a data set that can be directly accessed by other
plugins, or called a temporary table `(table)`;
+
+When `result_table_name` is specified, the data processed by this plugin will
be registered as a data set `(dataset)` that can be directly accessed by other
plugins, or called a temporary table `(table)` . The dataset registered here
can be directly accessed by other plugins by specifying `source_table_name` .
+
+## Examples
+
diff --git a/docs/en/transform-v2/copy.md b/docs/en/transform-v2/copy.md
new file mode 100644
index 000000000..39ad7f58c
--- /dev/null
+++ b/docs/en/transform-v2/copy.md
@@ -0,0 +1,66 @@
+# Copy
+
+> Copy transform plugin
+
+## Description
+
+Copy a field to a new field.
+
+## Options
+
+| name | type | required | default value |
+|---------------| ------ | -------- |---------------|
+| src_field | string | yes | |
+| dest_field | string | yes | |
+
+### src_field [string]
+
+Src field name you want to copy
+
+### dest_field [string]
+
+This dest field name
+
+### common options [string]
+
+Transform plugin common parameters, please refer to [Transform
Plugin](common-options.md) for details
+
+## Example
+
+The data read from source is a table like this:
+
+| name | age | card |
+|----------|-----|------|
+| Joy Ding | 20 | 123 |
+| May Ding | 20 | 123 |
+| Kin Dom | 20 | 123 |
+| Joy Dom | 20 | 123 |
+
+We want copy field `name` to a new field `name1`, we can add `Copy` Transform
like this
+
+```
+transform {
+ Copy {
+ source_table_name = "fake"
+ result_table_name = "fake1"
+ src_field = "name"
+ dest_field = "name1"
+ }
+}
+```
+
+Then the data in result table `fake1` will like this
+
+| name | age | card | name1 |
+|----------|-----|------|----------|
+| Joy Ding | 20 | 123 | Joy Ding |
+| May Ding | 20 | 123 | May Ding |
+| Kin Dom | 20 | 123 | Kin Dom |
+| Joy Dom | 20 | 123 | Joy Dom |
+
+
+## Changelog
+
+### new version
+
+- Add Copy Transform Connector
\ No newline at end of file
diff --git a/docs/en/transform-v2/filter-rowkind.md
b/docs/en/transform-v2/filter-rowkind.md
new file mode 100644
index 000000000..e67bab89a
--- /dev/null
+++ b/docs/en/transform-v2/filter-rowkind.md
@@ -0,0 +1,67 @@
+# FilterRowKind
+
+> FilterRowKind transform plugin
+
+## Description
+
+Filter the data by RowKind
+
+## Options
+
+| name | type | required | default value |
+|---------------|-------| -------- |---------------|
+| include_kinds | array | yes | |
+| exclude_kinds | array | yes | |
+
+### include_kinds [array]
+
+The row kinds to include
+
+### exclude_kinds [array]
+
+The row kinds to exclude.
+
+You can only config one of `include_kinds` and `exclude_kinds`.
+
+### common options [string]
+
+Transform plugin common parameters, please refer to [Transform
Plugin](common-options.md) for details
+
+## Examples
+
+The RowKink of the data generate by FakeSource is `INSERT`, If we use
`FilterRowKink` transform and exclude the `INSERT` data, we will write zero
rows into sink.
+
+```yaml
+
+env {
+ job.mode = "BATCH"
+}
+
+source {
+ FakeSource {
+ result_table_name = "fake"
+ row.num = 100
+ schema = {
+ fields {
+ id = "int"
+ name = "string"
+ age = "int"
+ }
+ }
+ }
+}
+
+transform {
+ FilterRowKind {
+ source_table_name = "fake"
+ result_table_name = "fake1"
+ exclude_kinds = ["INSERT"]
+ }
+}
+
+sink {
+ Console {
+ source_table_name = "fake1"
+ }
+}
+```
\ No newline at end of file
diff --git a/docs/en/transform-v2/filter.md b/docs/en/transform-v2/filter.md
new file mode 100644
index 000000000..0a4722a9c
--- /dev/null
+++ b/docs/en/transform-v2/filter.md
@@ -0,0 +1,60 @@
+# Filter
+
+> Filter transform plugin
+
+## Description
+
+Filter the field.
+
+## Options
+
+| name | type | required | default value |
+|--------------|-------| -------- |---------------|
+| fields | array | yes | |
+
+### fields [array]
+
+The list of fields that need to be kept. Fields not in the list will be deleted
+
+### common options [string]
+
+Transform plugin common parameters, please refer to [Transform
Plugin](common-options.md) for details
+
+## Example
+
+The data read from source is a table like this:
+
+| name | age | card |
+|----------|-----|------|
+| Joy Ding | 20 | 123 |
+| May Ding | 20 | 123 |
+| Kin Dom | 20 | 123 |
+| Joy Dom | 20 | 123 |
+
+We want to delete field `age`, we can add `Filter` Transform like this
+
+```
+transform {
+ Filter {
+ source_table_name = "fake"
+ result_table_name = "fake1"
+ fields = [name, card]
+ }
+}
+```
+
+Then the data in result table `fake1` will like this
+
+| name | card |
+|----------|------|
+| Joy Ding | 123 |
+| May Ding | 123 |
+| Kin Dom | 123 |
+| Joy Dom | 123 |
+
+
+## Changelog
+
+### new version
+
+- Add Filter Transform Connector
\ No newline at end of file
diff --git a/docs/en/transform-v2/replace.md b/docs/en/transform-v2/replace.md
new file mode 100644
index 000000000..f23a50835
--- /dev/null
+++ b/docs/en/transform-v2/replace.md
@@ -0,0 +1,121 @@
+# Replace
+
+> Replace transform plugin
+
+## Description
+
+Examines string value in a given field and replaces substring of the string
value that matches the given string literal or regexes with the given
replacement.
+
+## Options
+
+| name | type | required | default value |
+| -------------- | ------ | -------- |---------------|
+| replace_field | string | yes | |
+| pattern | string | yes | - |
+| replacement | string | yes | - |
+| is_regex | boolean| no | false |
+| replace_first | boolean| no | false |
+
+### replace_field [string]
+
+The field you want to replace
+
+### pattern [string]
+
+The old string that will be replaced
+
+### replacement [string]
+
+The new string for replace
+
+### is_regex [boolean]
+
+Use regex for string match
+
+### replace_first [boolean]
+
+Whether replace the first match string. Only used when `is_regex = true`.
+
+### common options [string]
+
+Transform plugin common parameters, please refer to [Transform
Plugin](common-options.md) for details
+
+## Example
+
+The data read from source is a table like this:
+
+| name | age | card |
+|----------|-----|------|
+| Joy Ding | 20 | 123 |
+| May Ding | 20 | 123 |
+| Kin Dom | 20 | 123 |
+| Joy Dom | 20 | 123 |
+
+We want to replace the char ` ` to `_` at the `name` field. Then we can add a
`Replace` Transform like this:
+
+```
+transform {
+ Replace {
+ source_table_name = "fake"
+ result_table_name = "fake1"
+ replace_field = "name"
+ pattern = " "
+ replacement = "_"
+ is_regex = true
+ }
+}
+```
+
+Then the data in result table `fake1` will update to
+
+
+| name | age | card |
+|----------|-----|------|
+| Joy_Ding | 20 | 123 |
+| May_Ding | 20 | 123 |
+| Kin_Dom | 20 | 123 |
+| Joy_Dom | 20 | 123 |
+
+## Job Config Example
+
+```
+env {
+ job.mode = "BATCH"
+}
+
+source {
+ FakeSource {
+ result_table_name = "fake"
+ row.num = 100
+ schema = {
+ fields {
+ id = "int"
+ name = "string"
+ }
+ }
+ }
+}
+
+transform {
+ Replace {
+ source_table_name = "fake"
+ result_table_name = "fake1"
+ replace_field = "name"
+ pattern = ".+"
+ replacement = "b"
+ is_regex = true
+ }
+}
+
+sink {
+ Console {
+ source_table_name = "fake1"
+ }
+}
+```
+
+## Changelog
+
+### new version
+
+- Add Replace Transform Connector
\ No newline at end of file
diff --git a/docs/en/transform-v2/split.md b/docs/en/transform-v2/split.md
new file mode 100644
index 000000000..9be8b2314
--- /dev/null
+++ b/docs/en/transform-v2/split.md
@@ -0,0 +1,72 @@
+# Split
+
+> Split transform plugin
+
+## Description
+
+Split a field to more than one field.
+
+## Options
+
+| name | type | required | default value |
+|----------------|--------| -------- |---------------|
+| separator | string | yes | |
+| split_field | string | yes | |
+| output_fields | array | yes | |
+
+### separator [string]
+
+The list of fields that need to be kept. Fields not in the list will be deleted
+
+### split_field[string]
+
+The field to be split
+
+### output_fields[array]
+
+The result fields after split
+
+### common options [string]
+
+Transform plugin common parameters, please refer to [Transform
Plugin](common-options.md) for details
+
+## Example
+
+The data read from source is a table like this:
+
+| name | age | card |
+|----------|-----|------|
+| Joy Ding | 20 | 123 |
+| May Ding | 20 | 123 |
+| Kin Dom | 20 | 123 |
+| Joy Dom | 20 | 123 |
+
+We want split `name` field to `first_name` and `second name`, we can add
`Split` transform like this
+
+```
+transform {
+ Split {
+ source_table_name = "fake"
+ result_table_name = "fake1"
+ separator = " "
+ split_field = "name"
+ output_fields = [first_name, second_name]
+ }
+}
+```
+
+Then the data in result table `fake1` will like this
+
+| name | age | card | first_name | last_name |
+|----------|-----|------|------------|-----------|
+| Joy Ding | 20 | 123 | Joy | Ding |
+| May Ding | 20 | 123 | May | Ding |
+| Kin Dom | 20 | 123 | Kin | Dom |
+| Joy Dom | 20 | 123 | Joy | Dom |
+
+
+## Changelog
+
+### new version
+
+- Add Split Transform Connector
\ No newline at end of file
diff --git a/docs/en/transform/common-options.mdx
b/docs/en/transform/common-options.mdx
deleted file mode 100644
index 9f20ea13a..000000000
--- a/docs/en/transform/common-options.mdx
+++ /dev/null
@@ -1,118 +0,0 @@
-import Tabs from '@theme/Tabs';
-import TabItem from '@theme/TabItem';
-
-# Common Options
-
-> Common parameters of transform plugins
-
-:::tip
-
-This transform both supported by engine Spark and Flink.
-
-:::
-
-## Transform Plugin common parameters
-
-<Tabs
- groupId="engine-type"
- defaultValue="spark"
- values={[
- {label: 'Spark', value: 'spark'},
- {label: 'Flink', value: 'flink'},
- ]}>
-<TabItem value="spark">
-
-| name | type | required | default value |
-| ----------------- | ------ | -------- | ------------- |
-| source_table_name | string | no | - |
-| result_table_name | string | no | - |
-
-</TabItem>
-<TabItem value="flink">
-
-| name | type | required | default value |
-| ----------------- | ------ | -------- | ------------- |
-| source_table_name | string | no | - |
-| result_table_name | string | no | - |
-| field_name | string | no | - |
-
-### field_name [string]
-
-When the data is obtained from the upper-level plugin, you can specify the
name of the obtained field, which is convenient for use in subsequent sql
plugins.
-
-</TabItem>
-</Tabs>
-
-### source_table_name [string]
-
-When `source_table_name` is not specified, the current plug-in processes the
data set `(dataset)` output by the previous plug-in in the configuration file;
-
-When `source_table_name` is specified, the current plugin is processing the
data set corresponding to this parameter.
-
-### result_table_name [string]
-
-When `result_table_name` is not specified, the data processed by this plugin
will not be registered as a data set that can be directly accessed by other
plugins, or called a temporary table `(table)`;
-
-When `result_table_name` is specified, the data processed by this plugin will
be registered as a data set `(dataset)` that can be directly accessed by other
plugins, or called a temporary table `(table)` . The dataset registered here
can be directly accessed by other plugins by specifying `source_table_name` .
-
-## Examples
-
-<Tabs
- groupId="engine-type"
- defaultValue="spark"
- values={[
- {label: 'Spark', value: 'spark'},
- {label: 'Flink', value: 'flink'},
- ]}>
-<TabItem value="spark">
-
-```bash
-split {
- source_table_name = "source_view_table"
- source_field = "message"
- delimiter = "&"
- fields = ["field1", "field2"]
- result_table_name = "result_view_table"
-}
-```
-
-> The `Split` plugin will process the data in the temporary table
`source_view_table` and register the processing result as a temporary table
named `result_view_table`. This temporary table can be used by any subsequent
`Filter` or `Output` plugin by specifying `source_table_name` .
-
-```bash
-split {
- source_field = "message"
- delimiter = "&"
- fields = ["field1", "field2"]
-}
-```
-
-> Note: If `source_table_name` is not configured, output the processing result
of the last `Transform` plugin in the configuration file
-
-</TabItem>
-<TabItem value="flink">
-
-```bash
-source {
- FakeSourceStream {
- result_table_name = "fake_1"
- field_name = "name,age"
- }
- FakeSourceStream {
- result_table_name = "fake_2"
- field_name = "name,age"
- }
-}
-
-transform {
- sql {
- source_table_name = "fake_1"
- sql = "select name from fake_1"
- result_table_name = "fake_name"
- }
-}
-```
-
-> If `source_table_name` is not specified, the sql plugin will process the
data of `fake_2` , and if it is set to `fake_1` , it will process the data of
`fake_1` .
-
-</TabItem>
-</Tabs>
\ No newline at end of file
diff --git a/docs/en/transform/json.md b/docs/en/transform/json.md
deleted file mode 100644
index 5ec5ba5c1..000000000
--- a/docs/en/transform/json.md
+++ /dev/null
@@ -1,197 +0,0 @@
-# Json
-
-> Json transform plugin
-
-## Description
-
-Json analysis of the specified fields of the original data set
-
-:::tip
-
-This transform **ONLY** supported by Spark.
-
-:::
-
-## Options
-
-| name | type | required | default value |
-| -------------- | ------ | -------- | ------------- |
-| source_field | string | no | raw_message |
-| target_field | string | no | __root__ |
-| schema_dir | string | no | - |
-| schema_file | string | no | - |
-| common-options | string | no | - |
-
-### source_field [string]
-
-Source field, if not configured, the default is `raw_message`
-
-### target_field [string]
-
-The target field, if it is not configured, the default is `__root__` , and the
result of Json parsing will be uniformly placed at the top of the `Dataframe`
-
-### schema_dir [string]
-
-Style directory, if not configured, the default is
`$seatunnelRoot/plugins/json/files/schemas/`
-
-### schema_file [string]
-
-The style file name, if it is not configured, the default is empty, that is,
the structure is not specified, and the system derives it by itself according
to the input of the data source.
-
-### common options [string]
-
-Transform plugin common parameters, please refer to [Transform
Plugin](common-options.mdx) for details
-
-## Schema Use cases
-
-- `json schema` usage scenarios
-
-The multiple data sources of a single task may contain different styles of
json data. For example, the `topicA` style from `Kafka` is
-
-```json
-{
- "A": "a_val",
- "B": "b_val"
-}
-```
-
-The style from `topicB` is
-
-```json
-{
- "C": "c_val",
- "D": "d_val"
-}
-```
-
-When running `Transform` , you need to fuse the data of `topicA` and `topicB`
into a wide table for calculation. You can specify a `schema` whose content
style is:
-
-```json
-{
- "A": "a_val",
- "B": "b_val",
- "C": "c_val",
- "D": "d_val"
-}
-```
-
-Then the fusion output result of `topicA` and `topicB` is:
-
-```bash
-+-----+-----+-----+-----+
-|A |B |C |D |
-+-----+-----+-----+-----+
-|a_val|b_val|null |null |
-|null |null |c_val|d_val|
-+-----+-----+-----+-----+
-```
-
-## Examples
-
-### Do not use `target_field`
-
-```bash
-json {
- source_field = "message"
-}
-```
-
-- Source
-
-```bash
-+----------------------------+
-|message |
-+----------------------------+
-|{"name": "ricky", "age": 24}|
-|{"name": "gary", "age": 28} |
-+----------------------------+
-```
-
-- Sink
-
-```bash
-+----------------------------+---+-----+
-|message |age|name |
-+----------------------------+---+-----+
-|{"name": "gary", "age": 28} |28 |gary |
-|{"name": "ricky", "age": 23}|23 |ricky|
-+----------------------------+---+-----+
-```
-
-### Use `target_field`
-
-```bash
-json {
- source_field = "message"
- target_field = "info"
-}
-```
-
-- Souce
-
-```bash
-+----------------------------+
-|message |
-+----------------------------+
-|{"name": "ricky", "age": 24}|
-|{"name": "gary", "age": 28} |
-+----------------------------+
-```
-
-- Sink
-
-```bash
-+----------------------------+----------+
-|message |info |
-+----------------------------+----------+
-|{"name": "gary", "age": 28} |[28,gary] |
-|{"name": "ricky", "age": 23}|[23,ricky]|
-+----------------------------+----------+
-```
-
-> The results of json processing support `select * from where info.age = 23`
such SQL statements
-
-### Use `schema_file`
-
-```bash
-json {
- source_field = "message"
- schema_file = "demo.json"
-}
-```
-
-- Schema
-
-Place the following content in
`~/seatunnel/plugins/json/files/schemas/demo.json` of Driver Node:
-
-```json
-{
- "name": "demo",
- "age": 24,
- "city": "LA"
-}
-```
-
-- Source
-
-```bash
-+----------------------------+
-|message |
-+----------------------------+
-|{"name": "ricky", "age": 24}|
-|{"name": "gary", "age": 28} |
-+----------------------------+
-```
-
-- Sink
-
-```bash
-+----------------------------+---+-----+-----+
-|message |age|name |city |
-+----------------------------+---+-----+-----+
-|{"name": "gary", "age": 28} |28 |gary |null |
-|{"name": "ricky", "age": 23}|23 |ricky|null |
-+----------------------------+---+-----+-----+
-```
-
-> If you use `cluster mode` for deployment, make sure that the `json schemas`
directory is packaged in `plugins.tar.gz`
diff --git a/docs/en/transform/nullRate.md b/docs/en/transform/nullRate.md
deleted file mode 100644
index a5c7bf1eb..000000000
--- a/docs/en/transform/nullRate.md
+++ /dev/null
@@ -1,69 +0,0 @@
-# NullRate
-
-> NULL rate transform plugin
-
-## Description
-
-When there is a large amount of data, the final result will always be greatly
affected by the problem of data null value. Therefore, early null value
detection is particularly important. For this reason, this function came into
being
-
-:::tip
-
-This transform **ONLY** supported by Spark.
-
-:::
-
-## Options
-
-| name | type | required | default value |
-| -------------------------| ------------ | -------- | ------------- |
-| fields | string_list | yes | - |
-| rates | double_list | yes | - |
-| throw_exception_enable | boolean | no | - |
-| save_to_table_name | string | no | - |
-
-
-
-### field [string_list]
-
-Which fields do you want to monitor .
-
-### rates [double_list]
-
-It is consistent with the number of fields. Double type indicates the set null
rate value .
-
-### throw_exception_enable [boolean]
-
-Whether to throw an exception when it is greater than the set value. The
default value is false .
-
-### save_to_table_name [string]
-
-Whether the current verification value is output to the table. It is not
output by defaul .
-
-### common options [string]
-
-Transform plugin common parameters, please refer to [Transform
Plugin](common-options.mdx) for details
-
-## Examples
-
-```bash
- nullRate {
- fields = ["msg", "name"]
- rates = [10.0,3.45]
- save_to_table_name = "tmp"
- throw_exception_enable = true
- }
-}
-```
-
-Use `NullRate` in transform's Dataset.
-
-```bash
- transform {
- NullRate {
- fields = ["msg", "name"]
- rates = [10.0,3.45]
- save_to_table_name = "tmp"
- throw_exception_enable = true
- }
- }
-```
diff --git a/docs/en/transform/nulltf.md b/docs/en/transform/nulltf.md
deleted file mode 100644
index 55f124d10..000000000
--- a/docs/en/transform/nulltf.md
+++ /dev/null
@@ -1,75 +0,0 @@
-# Nulltf
-
-> NULL default value transform plugin
-
-## Description
-
-set default value for null field
-
-:::tip
-
-This transform only supported by engine Spark.
-
-:::
-
-## Options
-
-| name | type | required | default value |
-| ------------------- | ------- | -------- | ------------- |
-| fields | array | no | - |
-
-### fields [list]
-
-A list of fields whose default value will be set.
-The default value of the field can be set in the form of "field:value", If no
set, the default value will be set according to the field type.
-
-## Examples
-
-the configuration
-
-```bash
- nulltf {
- fields {
- name: "",
- price: 0,
- num: 100,
- flag: false,
- dt_timestamp: "2022-05-18 13:51:40.603",
- dt_date: "2022-05-19"
- }
- }
-```
-
-before use nulltf transform
-
-```bash
-+-----+-----+----+-----+--------------------+----------+
-| name|price| num| flag| dt_timestamp| dt_date|
-+-----+-----+----+-----+--------------------+----------+
-|名称1| 22.5| 100|false|2022-05-20 14:34:...|2022-05-20|
-| null| 22.5| 100|false|2022-05-20 14:35:...|2022-05-20|
-|名称1| null| 100|false|2022-05-20 14:35:...|2022-05-20|
-|名称1| 22.5|null|false|2022-05-20 14:36:...|2022-05-20|
-|名称1| 22.5| 100| null|2022-05-20 14:36:...|2022-05-20|
-|名称1| 22.5| 100|false| null|2022-05-20|
-|名称1| 22.5| 100|false|2022-05-20 14:37:...| null|
-+-----+-----+----+-----+--------------------+----------+
-```
-
-after use nulltf transform
-
-```bash
-+-----+-----+----+-----+--------------------+----------+
-| name|price| num| flag| dt_timestamp| dt_date|
-+-----+-----+----+-----+--------------------+----------+
-|名称1| 22.5|100|false|2022-05-20 14:34:...|2022-05-20|
-| | 22.5|100|false|2022-05-20 14:35:...|2022-05-20|
-|名称1| 0.0|100|false|2022-05-20 14:35:...|2022-05-20|
-|名称1| 22.5|100|false|2022-05-20 14:36:...|2022-05-20|
-|名称1| 22.5|100|false|2022-05-20 14:36:...|2022-05-20|
-|名称1| 22.5|100|false|2022-05-18 13:51:...|2022-05-20|
-|名称1| 22.5|100|false|2022-05-20 14:37:...|2022-05-19|
-+-----+-----+---+-----+--------------------+----------+
-```
-
-
diff --git a/docs/en/transform/replace.md b/docs/en/transform/replace.md
deleted file mode 100644
index 1bf57fc7f..000000000
--- a/docs/en/transform/replace.md
+++ /dev/null
@@ -1,81 +0,0 @@
-# Replace
-
-> Replace transform plugin
-
-## Description
-
-Examines string value in a given field and replaces substring of the string
value that matches the given string literal or regexes with the given
replacement.
-
-:::tip
-
-This transform **ONLY** supported by Spark.
-
-:::
-
-## Options
-
-| name | type | required | default value |
-| -------------- | ------ | -------- | ------------- |
-| source_field | string | no | raw_message |
-| fields | string | yes | - |
-| pattern | string | yes | - |
-| replacement | string | yes | - |
-| is_regex | boolean| no | false |
-| replace_first | boolean| no | false |
-
-### source_field [string]
-
-Source field, if not configured, the default is `raw_message`
-
-### field [string]
-
-The name of the field to replaced.
-
-### pattern [string]
-
-The string to match.
-
-### replacement [string]
-
-The replacement pattern (is_regex is true) or string literal (is_regex is
false).
-
-### is_regex [boolean]
-
-Whether or not to interpret the pattern as a regex (true) or string literal
(false).
-
-### replace_first [boolean]
-
-Whether or not to skip any matches beyond the first match.
-
-### common options [string]
-
-Transform plugin common parameters, please refer to [Transform
Plugin](common-options.mdx) for details
-
-## Examples
-the word `a` will be replaced by `b` at message field values.
-
-```bash
-replace {
- source_field = "message"
- fields = "_replaced"
- pattern = "a"
- replacement = "b"
-}
-```
-
-Use `Replace` as udf in sql.
-
-```bash
- Replace {
- fields = "_replaced"
- pattern = "([^ ]*) ([^ ]*)"
- replacement = "$2"
- isRegex = true
- replaceFirst = true
- }
-
- # Use the split function (confirm that the fake table exists)
- sql {
- sql = "select * from (select raw_message, replace(raw_message) as info_row
from fake) t1"
- }
-```
diff --git a/docs/en/transform/split.mdx b/docs/en/transform/split.mdx
deleted file mode 100644
index abc288ec1..000000000
--- a/docs/en/transform/split.mdx
+++ /dev/null
@@ -1,124 +0,0 @@
-import Tabs from '@theme/Tabs';
-import TabItem from '@theme/TabItem';
-
-# Split
-
-> Split transform plugin
-
-## Description
-
-A string cutting function is defined, which is used to split the specified
field in the Sql plugin.
-
-:::tip
-
-This transform both supported by engine Spark and Flink.
-
-:::
-
-## Options
-
-<Tabs
- groupId="engine-type"
- defaultValue="spark"
- values={[
- {label: 'Spark', value: 'spark'},
- {label: 'Flink', value: 'flink'},
- ]}>
-<TabItem value="spark">
-
-| name | type | required | default value |
-| -------------- | ------ | -------- | ------------- |
-| separator | string | no | " " |
-| fields | array | yes | - |
-| source_field | string | no | raw_message |
-| target_field | string | no | *root* |
-| common-options | string | no | - |
-
-### separator [string]
-
-Separator, the input string is separated according to the separator. The
default separator is a space `(" ")` .
-Note: If you use some special characters in the separator, you need to escape
it. e.g. "\\|"
-
-### source_field [string]
-
-The source field of the string before being split, if not configured, the
default is `raw_message`
-
-### target_field [string]
-
-`target_field` can specify the location where multiple split fields are added
to the Event. If it is not configured, the default is `_root_` , that is, all
split fields will be added to the top level of the Event. If a specific field
is specified, the divided field will be added to the next level of this field.
-
-</TabItem>
-<TabItem value="flink">
-
-| name | type | required | default value |
-| -------------- | ------ | -------- | ------------- |
-| separator | string | no | , |
-| fields | array | yes | - |
-| common-options | string | no | - |
-
-### separator [string]
-
-The specified delimiter, the default is `,`
-
-</TabItem>
-</Tabs>
-
-### fields [list]
-
-In the split field name list, specify the field names of each character string
after splitting in order. If the length of the `fields` is greater than the
length of the separation result, the extra fields are assigned null characters.
-
-### common options [string]
-
-Transform plugin common parameters, please refer to [Transform
Plugin](common-options.mdx) for details
-
-## Examples
-
-<Tabs
- groupId="engine-type"
- defaultValue="spark"
- values={[
- {label: 'Spark', value: 'spark'},
- {label: 'Flink', value: 'flink'},
- ]}>
-<TabItem value="spark">
-
-Split the `message` field in the source data according to `&`, you can use
`field1` or `field2` as the key to get the corresponding value
-
-```bash
-split {
- source_field = "message"
- separator = "&"
- fields = ["field1", "field2"]
-}
-```
-
-Split the `message` field in the source data according to `,` , the split
field is `info` , you can use `info.field1` or `info.field2` as the key to get
the corresponding value
-
-```bash
-split {
- source_field = "message"
- target_field = "info"
- separator = ","
- fields = ["field1", "field2"]
-}
-```
-
-</TabItem>
-<TabItem value="flink">
-
-</TabItem>
-</Tabs>
-
-Use `Split` as udf in sql.
-
-```bash
- # This just created a udf called split
- Split{
- separator = "#"
- fields = ["name","age"]
- }
- # Use the split function (confirm that the fake table exists)
- sql {
- sql = "select * from (select raw_message,split(raw_message) as info_row
from fake) t1"
- }
-```
diff --git a/docs/en/transform/sql.md b/docs/en/transform/sql.md
deleted file mode 100644
index c06e6357c..000000000
--- a/docs/en/transform/sql.md
+++ /dev/null
@@ -1,61 +0,0 @@
-# Sql
-
-> Sql transform plugin
-
-## Description
-
-Use SQL to process data and support engine's UDF function.
-
-:::tip
-
-This transform both supported by engine Spark and Flink.
-
-:::
-
-## Options
-
-| name | type | required | default value |
-| -------------- | ------ | -------- | ------------- |
-| sql | string | yes | - |
-| common-options | string | no | - |
-
-### sql [string]
-
-SQL statement, the table name used in SQL configured in the `Source` or
`Transform` plugin
-
-### common options [string]
-
-Transform plugin common parameters, please refer to [Transform
Plugin](common-options.mdx) for details
-
-## Examples
-
-### Simple Select
-
-Use the SQL plugin for field deletion. Only the `username` and `address`
fields are reserved, and the remaining fields will be discarded. `user_info` is
the `result_table_name` configured by the previous plugin
-
-```bash
-sql {
- sql = "select username, address from user_info",
-}
-```
-
-### Use UDF
-
-Use SQL plugin for data processing, use `substring` functions to intercept the
`telephone` field
-
-```bash
-sql {
- sql = "select substring(telephone, 0, 10) from user_info",
-}
-```
-
-### Use UDAF
-
-Use SQL plugin for data aggregation, use avg functions to perform aggregation
operations on the original data set, and take out the average value of the
`age` field
-
-```bash
-sql {
- sql = "select avg(age) from user_info",
-}
-```
-
diff --git a/docs/en/transform/udf.md b/docs/en/transform/udf.md
deleted file mode 100644
index 21a37e0e1..000000000
--- a/docs/en/transform/udf.md
+++ /dev/null
@@ -1,44 +0,0 @@
-# udf
-
-> UDF transform plugin
-
-## Description
-
-Supports using UDF in data integration by the transform.
-Need to specify the function name and class name and put UDF jars in Flink's
classpath or import them via 'Flink run -c xxx.jar'
-
-:::tip
-
-This transform **ONLY** supported by Flink.
-
-:::
-
-## Options
-
-| name | type | required | default value |
-| -------------- | ----------- | -------- | ------------- |
-| function | string | yes | - |
-
-### function [string]
-
-A config prefix, use like `function.test="xxx.Test"`.
-
-### common options [string]
-
-Transform plugin common parameters, please refer to [Transform
Plugin](common-options.mdx) for details
-
-## Examples
-
-Use `udf` in sql.
-
-```bash
- udf {
- function.test_1 = "com.example.udf.flink.TestUDF"
- function.test_2 = "com.example.udf.flink.TestUDTF"
- }
-
- # Use the specify function (confirm that the fake table exists)
- sql {
- sql = "select test_1(name), age from fake"
- }
-```
diff --git a/docs/en/transform/uuid.md b/docs/en/transform/uuid.md
deleted file mode 100644
index 6be962dee..000000000
--- a/docs/en/transform/uuid.md
+++ /dev/null
@@ -1,64 +0,0 @@
-# UUID
-
-> UUID transform plugin
-
-## Description
-
-Generate a universally unique identifier on a specified field.
-
-:::tip
-
-This transform **ONLY** supported by Spark.
-
-:::
-
-## Options
-
-| name | type | required | default value |
-| -------------- | ------ | -------- | ------------- |
-| fields | string | yes | - |
-| prefix | string | no | - |
-| secure | boolean| no | false |
-
-### field [string]
-
-The name of the field to generate.
-
-### prefix [string]
-
-The prefix string constant to prepend to each generated UUID.
-
-### secure [boolean]
-
-the cryptographically secure algorithm can be comparatively slow
-The nonSecure algorithm uses a secure random seed but is otherwise
deterministic
-
-### common options [string]
-
-Transform plugin common parameters, please refer to [Transform
Plugin](common-options.mdx) for details
-
-## Examples
-
-```bash
- UUID {
- fields = "u"
- prefix = "uuid-"
- secure = true
- }
-}
-```
-
-Use `UUID` as udf in sql.
-
-```bash
- UUID {
- fields = "u"
- prefix = "uuid-"
- secure = true
- }
-
- # Use the uuid function (confirm that the fake table exists)
- sql {
- sql = "select * from (select raw_message, UUID() as info_row from fake) t1"
- }
-```
diff --git a/docs/sidebars.js b/docs/sidebars.js
index 3e2681e97..ab4c1cef1 100644
--- a/docs/sidebars.js
+++ b/docs/sidebars.js
@@ -138,19 +138,19 @@ const sidebars = {
},
{
"type": "category",
- "label": "Transform",
+ "label": "Transform-V2",
"link": {
"type": "generated-index",
- "title": "Transform of SeaTunnel",
- "description": "List all transform supported Apache SeaTunnel
for now.",
- "slug": "/category/transform",
- "keywords": ["transform"],
+ "title": "Transform V2 of SeaTunnel",
+ "description": "List all transform v2 supported Apache
SeaTunnel for now.",
+ "slug": "/category/transform-v2",
+ "keywords": ["transform-v2"],
"image": "/img/favicon.ico"
},
"items": [
{
"type": "autogenerated",
- "dirName": "transform"
+ "dirName": "transform-v2"
}
]
},
diff --git a/seatunnel-connectors-v2/README.md
b/seatunnel-connectors-v2/README.md
index dfa7badf6..b3f8cde38 100644
--- a/seatunnel-connectors-v2/README.md
+++ b/seatunnel-connectors-v2/README.md
@@ -12,12 +12,14 @@ development at the current stage, and reduces the
difficulty of merging.
### engineering structure
-- ../`seatunnel-connectors-v2`
connector-v2 code implementation
-- ../`seatunnel-translation`
translation layer for the connector-v2
-- ../seatunnel-e2e/`seatunnel-flink-connector-v2-e2e` end to
end testcase running on flink
-- ../seatunnel-e2e/`seatunnel-spark-connector-v2-e2e` end to
end testcase running on spark
-- ../seatunnel-examples/`seatunnel-flink-connector-v2-example`
seatunnel connector-v2 example use flink local running instance
-- ../seatunnel-examples/`seatunnel-spark-connector-v2-example`
seatunnel connector-v2 example use spark local running instance
+- ../`seatunnel-connectors-v2`
connector-v2 code implementation
+- ../`seatunnel-translation`
translation layer for the connector-v2
+- ../`seatunnel-transform-v2`
transform v2 connector implementation
+- ../seatunnel-e2e/`seatunnel-connector-v2-e2e`
connector v2 e2e code
+- ../seatunnel-e2e/`seatunnel-flink-connector-v2-e2e`
Obsolete, replaced by seatunnel-connector-v2-e2e
+- ../seatunnel-e2e/`seatunnel-spark-connector-v2-e2e`
Obsolete, replaced by seatunnel-connector-v2-e2e
+- ../seatunnel-examples/`seatunnel-flink-connector-v2-example`
seatunnel connector-v2 example use flink local running instance
+- ../seatunnel-examples/`seatunnel-spark-connector-v2-example`
seatunnel connector-v2 example use spark local running instance
### **Example**
diff --git
a/seatunnel-e2e/seatunnel-engine-e2e/connector-seatunnel-e2e-base/src/test/java/org/apache/seatunnel/engine/e2e/ClusterFaultToleranceIT.java
b/seatunnel-e2e/seatunnel-engine-e2e/connector-seatunnel-e2e-base/src/test/java/org/apache/seatunnel/engine/e2e/ClusterFaultToleranceIT.java
index ec15be62f..5eb5ebb20 100644
---
a/seatunnel-e2e/seatunnel-engine-e2e/connector-seatunnel-e2e-base/src/test/java/org/apache/seatunnel/engine/e2e/ClusterFaultToleranceIT.java
+++
b/seatunnel-e2e/seatunnel-engine-e2e/connector-seatunnel-e2e-base/src/test/java/org/apache/seatunnel/engine/e2e/ClusterFaultToleranceIT.java
@@ -223,7 +223,7 @@ public class ClusterFaultToleranceIT {
clientJobProxy.cancelJob();
- Awaitility.await().atMost(20000, TimeUnit.MILLISECONDS)
+ Awaitility.await().atMost(200000, TimeUnit.MILLISECONDS)
.untilAsserted(() -> Assertions.assertTrue(
objectCompletableFuture.isDone() &&
JobStatus.CANCELED.equals(objectCompletableFuture.get())));
@@ -397,7 +397,7 @@ public class ClusterFaultToleranceIT {
Thread.sleep(10000);
clientJobProxy.cancelJob();
- Awaitility.await().atMost(20000, TimeUnit.MILLISECONDS)
+ Awaitility.await().atMost(200000, TimeUnit.MILLISECONDS)
.untilAsserted(() -> Assertions.assertTrue(
objectCompletableFuture.isDone() &&
JobStatus.CANCELED.equals(objectCompletableFuture.get())));
@@ -571,7 +571,7 @@ public class ClusterFaultToleranceIT {
Thread.sleep(10000);
clientJobProxy.cancelJob();
- Awaitility.await().atMost(20000, TimeUnit.MILLISECONDS)
+ Awaitility.await().atMost(200000, TimeUnit.MILLISECONDS)
.untilAsserted(() -> Assertions.assertTrue(
objectCompletableFuture.isDone() &&
JobStatus.CANCELED.equals(objectCompletableFuture.get())));
@@ -733,7 +733,7 @@ public class ClusterFaultToleranceIT {
Thread.sleep(10000);
clientJobProxy.cancelJob();
- Awaitility.await().atMost(20000, TimeUnit.MILLISECONDS)
+ Awaitility.await().atMost(200000, TimeUnit.MILLISECONDS)
.untilAsserted(() -> Assertions.assertTrue(
objectCompletableFuture.isDone() &&
JobStatus.CANCELED.equals(objectCompletableFuture.get())));
diff --git
a/seatunnel-e2e/seatunnel-engine-e2e/connector-seatunnel-e2e-base/src/test/java/org/apache/seatunnel/engine/e2e/ClusterFaultToleranceTwoPipelineIT.java
b/seatunnel-e2e/seatunnel-engine-e2e/connector-seatunnel-e2e-base/src/test/java/org/apache/seatunnel/engine/e2e/ClusterFaultToleranceTwoPipelineIT.java
index 0797c2427..379b8391c 100644
---
a/seatunnel-e2e/seatunnel-engine-e2e/connector-seatunnel-e2e-base/src/test/java/org/apache/seatunnel/engine/e2e/ClusterFaultToleranceTwoPipelineIT.java
+++
b/seatunnel-e2e/seatunnel-engine-e2e/connector-seatunnel-e2e-base/src/test/java/org/apache/seatunnel/engine/e2e/ClusterFaultToleranceTwoPipelineIT.java
@@ -229,7 +229,7 @@ public class ClusterFaultToleranceTwoPipelineIT {
clientJobProxy.cancelJob();
- Awaitility.await().atMost(20000, TimeUnit.MILLISECONDS)
+ Awaitility.await().atMost(200000, TimeUnit.MILLISECONDS)
.untilAsserted(() -> Assertions.assertTrue(
objectCompletableFuture.isDone() &&
JobStatus.CANCELED.equals(objectCompletableFuture.get())));
@@ -407,7 +407,7 @@ public class ClusterFaultToleranceTwoPipelineIT {
Thread.sleep(10000);
clientJobProxy.cancelJob();
- Awaitility.await().atMost(20000, TimeUnit.MILLISECONDS)
+ Awaitility.await().atMost(200000, TimeUnit.MILLISECONDS)
.untilAsserted(() -> Assertions.assertTrue(
objectCompletableFuture.isDone() &&
JobStatus.CANCELED.equals(objectCompletableFuture.get())));
@@ -585,7 +585,7 @@ public class ClusterFaultToleranceTwoPipelineIT {
Thread.sleep(10000);
clientJobProxy.cancelJob();
- Awaitility.await().atMost(20000, TimeUnit.MILLISECONDS)
+ Awaitility.await().atMost(200000, TimeUnit.MILLISECONDS)
.untilAsserted(() -> Assertions.assertTrue(
objectCompletableFuture.isDone() &&
JobStatus.CANCELED.equals(objectCompletableFuture.get())));
diff --git
a/seatunnel-transforms-v2/src/main/java/org/apache/seatunnel/transform/FilterFieldTransform.java
b/seatunnel-transforms-v2/src/main/java/org/apache/seatunnel/transform/FilterFieldTransform.java
index 7d69f2c95..f50088ddd 100644
---
a/seatunnel-transforms-v2/src/main/java/org/apache/seatunnel/transform/FilterFieldTransform.java
+++
b/seatunnel-transforms-v2/src/main/java/org/apache/seatunnel/transform/FilterFieldTransform.java
@@ -38,7 +38,7 @@ public class FilterFieldTransform extends
AbstractSeaTunnelTransform {
public static final Option<List<String>> KEY_FIELDS = Options.key("fields")
.listType()
.noDefaultValue()
- .withDescription("The fields you want to filter");
+ .withDescription("The list of fields that need to be kept. Fields
not in the list will be deleted");
private String[] fields;
private int[] inputValueIndex;