This is an automated email from the ASF dual-hosted git repository.
wanghailin pushed a commit to branch dev
in repository https://gitbox.apache.org/repos/asf/seatunnel.git
The following commit(s) were added to refs/heads/dev by this push:
new e5da7e840d [Feature][doc][Connector-V2][Common] Add Common connector
documentation (#5453)
e5da7e840d is described below
commit e5da7e840d67f0e845166d2ab8c9c70afe143cdb
Author: ZhilinLi <[email protected]>
AuthorDate: Sat Jun 15 10:02:13 2024 +0800
[Feature][doc][Connector-V2][Common] Add Common connector documentation
(#5453)
---
docs/en/connector-v2/sink/common-options.md | 21 +++-----
docs/en/connector-v2/source/common-options.md | 78 +++++++++++++++++++++------
docs/en/transform-v2/common-options.md | 66 ++++++++++++++++++-----
3 files changed, 125 insertions(+), 40 deletions(-)
diff --git a/docs/en/connector-v2/sink/common-options.md
b/docs/en/connector-v2/sink/common-options.md
index 2addc49278..bfcdc26a2b 100644
--- a/docs/en/connector-v2/sink/common-options.md
+++ b/docs/en/connector-v2/sink/common-options.md
@@ -2,24 +2,19 @@
> Common parameters of sink connectors
-| name | type | required | default value |
-|-------------------|--------|----------|---------------|
-| source_table_name | string | no | - |
-| parallelism | int | no | - |
+| Name | Type | Required | Default |
Description
|
+|-------------------|--------|----------|---------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| source_table_name | String | No | - | When `source_table_name`
is not specified, the current plug-in processes the data set `dataset` output
by the previous plugin in the configuration file <br/> When `source_table_name`
is specified, the current plug-in is processing the data set corresponding to
this parameter. |
-### source_table_name [string]
+# Important note
-When `source_table_name` is not specified, the current plug-in processes the
data set `dataset` output by the previous plugin in the configuration file;
+When the job configuration `source_table_name` you must set the
`result_table_name` parameter
-When `source_table_name` is specified, the current plug-in is processing the
data set corresponding to this parameter.
+## Task Example
-### parallelism [int]
+### Simple:
-When `parallelism` is not specified, the `parallelism` in env is used by
default.
-
-When parallelism is specified, it will override the parallelism in env.
-
-## Examples
+> This is the process of passing a data source through two transforms and
returning two different pipiles to different sinks
```bash
source {
diff --git a/docs/en/connector-v2/source/common-options.md
b/docs/en/connector-v2/source/common-options.md
index a9e607b28e..079f40663a 100644
--- a/docs/en/connector-v2/source/common-options.md
+++ b/docs/en/connector-v2/source/common-options.md
@@ -2,32 +2,80 @@
> Common parameters of source connectors
-| name | type | required | default value |
-|-------------------|--------|----------|---------------|
-| result_table_name | string | no | - |
-| parallelism | int | no | - |
+| Name | Type | Required | Default |
Description
[...]
+|-------------------|--------|----------|---------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[...]
+| result_table_name | String | No | - | When `result_table_name`
is not specified, the data processed by this plugin will not be registered as a
data set `(dataStream/dataset)` that can be directly accessed by other plugins,
or called a temporary table `(table)` <br/>When `result_table_name` is
specified, the data processed by this plugin will be registered as a data set
`(dataStream/dataset)` that can be directly accessed by other plugins, or
called a temporary table `(table [...]
+| parallelism | Int | No | - | When `parallelism` is not
specified, the `parallelism` in env is used by default. <br/>When parallelism
is specified, it will override the parallelism in env.
[...]
-### result_table_name [string]
+# Important note
-When `result_table_name` is not specified, the data processed by this plugin
will not be registered as a data set `(dataStream/dataset)` that can be
directly accessed by other plugins, or called a temporary table `(table)` ;
+When the job configuration `result_table_name` you must set the
`source_table_name` parameter
-When `result_table_name` is specified, the data processed by this plugin will
be registered as a data set `(dataStream/dataset)` that can be directly
accessed by other plugins, or called a temporary table `(table)` . The data set
`(dataStream/dataset)` registered here can be directly accessed by other
plugins by specifying `source_table_name` .
+## Task Example
-### parallelism [int]
+### Simple:
-When `parallelism` is not specified, the `parallelism` in env is used by
default.
-
-When parallelism is specified, it will override the parallelism in env.
-
-## Example
+> This registers a stream or batch data source and returns the table name
`fake_table` at registration
```bash
source {
FakeSourceStream {
- result_table_name = "fake"
+ result_table_name = "fake_table"
}
}
```
-> The result of the data source `FakeSourceStream` will be registered as a
temporary table named `fake` . This temporary table can be used by any
`Transform` or `Sink` plugin by specifying `source_table_name` .
+### Multiple Pipeline Simple
+
+> This is to convert the data source fake and write it to two different sinks
+
+```bash
+env {
+ job.mode = "BATCH"
+}
+
+source {
+ FakeSource {
+ result_table_name = "fake"
+ row.num = 100
+ schema = {
+ fields {
+ id = "int"
+ name = "string"
+ age = "int"
+ c_timestamp = "timestamp"
+ c_date = "date"
+ c_map = "map<string, string>"
+ c_array = "array<int>"
+ c_decimal = "decimal(30, 8)"
+ c_row = {
+ c_row = {
+ c_int = int
+ }
+ }
+ }
+ }
+ }
+}
+
+transform {
+ Sql {
+ source_table_name = "fake"
+ result_table_name = "fake1"
+ # the query table name must same as field 'source_table_name'
+ query = "select id, regexp_replace(name, '.+', 'b') as name, age+1 as age,
pi() as pi, c_timestamp, c_date, c_map, c_array, c_decimal, c_row from fake"
+ }
+ # The SQL transform support base function and criteria operation
+ # But the complex SQL unsupported yet, include: multi source table/rows JOIN
and AGGREGATE operation and the like
+}
+
+sink {
+ Console {
+ source_table_name = "fake1"
+ }
+ Console {
+ source_table_name = "fake"
+ }
+}
+```
diff --git a/docs/en/transform-v2/common-options.md
b/docs/en/transform-v2/common-options.md
index c45b4ba167..ce88ce8528 100644
--- a/docs/en/transform-v2/common-options.md
+++ b/docs/en/transform-v2/common-options.md
@@ -1,23 +1,65 @@
# Transform Common Options
-> Common parameters of source connectors
+> This is a process of intermediate conversion between the source and sink
terminals,You can use sql statements to smoothly complete the conversion process
-| name | type | required | default value |
-|-------------------|--------|----------|---------------|
-| result_table_name | string | no | - |
-| source_table_name | string | no | - |
+| Name | Type | Required | Default |
Description
[...]
+|-------------------|--------|----------|---------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[...]
+| result_table_name | String | No | - | When `source_table_name`
is not specified, the current plug-in processes the data set `(dataset)` output
by the previous plug-in in the configuration file; <br/>When
`source_table_name` is specified, the current plugin is processing the data set
corresponding to this parameter.
[...]
+| source_table_name | String | No | - | When `result_table_name`
is not specified, the data processed by this plugin will not be registered as a
data set that can be directly accessed by other plugins, or called a temporary
table `(table)`; <br/>When `result_table_name` is specified, the data processed
by this plugin will be registered as a data set `(dataset)` that can be
directly accessed by other plugins, or called a temporary table `(table)` . The
dataset registered here [...]
-### source_table_name [string]
+## Task Example
-When `source_table_name` is not specified, the current plug-in processes the
data set `(dataset)` output by the previous plug-in in the configuration file;
+### Simple:
-When `source_table_name` is specified, the current plugin is processing the
data set corresponding to this parameter.
+> This is the process of converting the data source to fake and write it to
two different sinks, Detailed reference `transform`
-### result_table_name [string]
+```bash
+env {
+ job.mode = "BATCH"
+}
-When `result_table_name` is not specified, the data processed by this plugin
will not be registered as a data set that can be directly accessed by other
plugins, or called a temporary table `(table)`;
+source {
+ FakeSource {
+ result_table_name = "fake"
+ row.num = 100
+ schema = {
+ fields {
+ id = "int"
+ name = "string"
+ age = "int"
+ c_timestamp = "timestamp"
+ c_date = "date"
+ c_map = "map<string, string>"
+ c_array = "array<int>"
+ c_decimal = "decimal(30, 8)"
+ c_row = {
+ c_row = {
+ c_int = int
+ }
+ }
+ }
+ }
+ }
+}
-When `result_table_name` is specified, the data processed by this plugin will
be registered as a data set `(dataset)` that can be directly accessed by other
plugins, or called a temporary table `(table)` . The dataset registered here
can be directly accessed by other plugins by specifying `source_table_name` .
+transform {
+ Sql {
+ source_table_name = "fake"
+ result_table_name = "fake1"
+ # the query table name must same as field 'source_table_name'
+ query = "select id, regexp_replace(name, '.+', 'b') as name, age+1 as age,
pi() as pi, c_timestamp, c_date, c_map, c_array, c_decimal, c_row from fake"
+ }
+ # The SQL transform support base function and criteria operation
+ # But the complex SQL unsupported yet, include: multi source table/rows JOIN
and AGGREGATE operation and the like
+}
-## Examples
+sink {
+ Console {
+ source_table_name = "fake1"
+ }
+ Console {
+ source_table_name = "fake"
+ }
+}
+```