This is an automated email from the ASF dual-hosted git repository.
wanghailin pushed a commit to branch dev
in repository https://gitbox.apache.org/repos/asf/seatunnel.git
The following commit(s) were added to refs/heads/dev by this push:
new b62b6741ef [Docs][Connector-V2][Doris] Reconstruct the Doris connector
document (#4903)
b62b6741ef is described below
commit b62b6741ef584d1e284abac0984198fc691c56c9
Author: Carl-Zhou-CN <[email protected]>
AuthorDate: Fri Jul 28 15:11:55 2023 +0800
[Docs][Connector-V2][Doris] Reconstruct the Doris connector document (#4903)
* [Docs][Connector-V2][Doris] Reconstruct the Doris connector document
---------
Co-authored-by: zhouyao <[email protected]>
---
docs/en/connector-v2/sink/Doris.md | 226 +++++++++++++++++++++++++++++--------
1 file changed, 179 insertions(+), 47 deletions(-)
diff --git a/docs/en/connector-v2/sink/Doris.md
b/docs/en/connector-v2/sink/Doris.md
index f586ac3bcc..506cb7f248 100644
--- a/docs/en/connector-v2/sink/Doris.md
+++ b/docs/en/connector-v2/sink/Doris.md
@@ -2,11 +2,24 @@
> Doris sink connector
+## Support Those Engines
+
+> Spark<br/>
+> Flink<br/>
+> SeaTunnel Zeta<br/>
+
+## Key Features
+
+- [x] [exactly-once](../../concept/connector-v2-features.md)
+- [x] [cdc](../../concept/connector-v2-features.md)
+
## Description
Used to send data to Doris. Both support streaming and batch mode.
The internal implementation of Doris sink connector is cached and imported by
stream load in batches.
+## Supported DataSource Info
+
:::tip
Version Supported
@@ -17,67 +30,186 @@ Version Supported
:::
-## Key features
-
-- [x] [exactly-once](../../concept/connector-v2-features.md)
-- [x] [cdc](../../concept/connector-v2-features.md)
-
-## Options
-
-| name | type | required | default value |
-|--------------------|--------|----------|---------------|
-| fenodes | string | yes | - |
-| username | string | yes | - |
-| password | string | yes | - |
-| table.identifier | string | yes | - |
-| sink.label-prefix | string | yes | - |
-| sink.enable-2pc | bool | no | true |
-| sink.enable-delete | bool | no | false |
-| doris.config | map | yes | - |
-
-### fenodes [string]
-
-`Doris` cluster fenodes address, the format is `"fe_ip:fe_http_port, ..."`
-
-### username [string]
-
-`Doris` user username
-
-### password [string]
-
-`Doris` user password
+## Sink Options
+
+| Name | Type | Required | Default | Description
|
+|---------------------|--------|----------|------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| fenodes | String | Yes | - | `Doris` cluster
fenodes address, the format is `"fe_ip:fe_http_port, ..."`
|
+| username | String | Yes | - | `Doris` user username
|
+| password | String | Yes | - | `Doris` user password
|
+| table.identifier | String | Yes | - | The name of `Doris`
table
|
+| sink.label-prefix | String | Yes | - | The label prefix used
by stream load imports. In the 2pc scenario, global uniqueness is required to
ensure the EOS semantics of SeaTunnel.
|
+| sink.enable-2pc | bool | No | - | Whether to enable
two-phase commit (2pc), the default is true, to ensure Exactly-Once semantics.
For two-phase commit, please refer to
[here](https://doris.apache.org/docs/dev/sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD).
|
+| sink.enable-delete | bool | No | - | Whether to enable
deletion. This option requires Doris table to enable batch delete function
(0.15+ version is enabled by default), and only supports Unique model. you can
get more detail at this
[link](https://doris.apache.org/docs/dev/data-operate/update-delete/batch-delete-manual)
|
+| sink.check-interval | int | No | 10000 | check exception with
the interval while loading
|
+| sink.max-retries | int | No | 3 | the max retry times
if writing records to database failed
|
+| sink.buffer-size | int | No | 256 * 1024 | the buffer size to
cache data for stream load.
|
+| sink.buffer-count | int | No | 3 | the buffer count to
cache data for stream load.
|
+| doris.config | map | yes | - | This option is used
to support operations such as `insert`, `delete`, and `update` when
automatically generate sql,and supported formats.
|
+
+## Data Type Mapping
+
+| Doris Data type | SeaTunnel Data type |
+|-----------------|-----------------------------------------|
+| BOOLEAN | BOOLEAN |
+| TINYINT | TINYINT |
+| SMALLINT | SMALLINT<br/>TINYINT |
+| INT | INT<br/>SMALLINT<br/>TINYINT |
+| BIGINT | BIGINT<br/>INT<br/>SMALLINT<br/>TINYINT |
+| LARGEINT | BIGINT<br/>INT<br/>SMALLINT<br/>TINYINT |
+| FLOAT | FLOAT |
+| DOUBLE | DOUBLE<br/>FLOAT |
+| DECIMAL | DECIMAL<br/>DOUBLE<br/>FLOAT |
+| DATE | DATE |
+| DATETIME | TIMESTAMP |
+| CHAR | STRING |
+| VARCHAR | STRING |
+| STRING | STRING |
+| ARRAY | ARRAY |
+| MAP | MAP |
+| JSON | STRING |
+| HLL | Not supported yet |
+| BITMAP | Not supported yet |
+| QUANTILE_STATE | Not supported yet |
+| STRUCT | Not supported yet |
-### table.identifier [string]
-
-The name of `Doris` table
+#### Supported import data formats
-### sink.label-prefix [string]
+The supported formats include CSV and JSON
-The label prefix used by stream load imports. In the 2pc scenario, global
uniqueness is required to ensure the EOS semantics of SeaTunnel.
+## Task Example
-### sink.enable-2pc [bool]
+### Simple:
-Whether to enable two-phase commit (2pc), the default is true, to ensure
Exactly-Once semantics. For two-phase commit, please refer to
[here](https://doris.apache.org/docs/dev/sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD).
+> The following example describes writing multiple data types to Doris, and
users need to create corresponding tables downstream
-### sink.enable-delete [bool]
+```hocon
+env {
+ parallelism = 1
+ job.mode = "BATCH"
+ checkpoint.interval = 10000
+}
-Whether to enable deletion. This option requires Doris table to enable batch
delete function (0.15+ version is enabled by default), and only supports Unique
model. you can get more detail at this link:
+source {
+ FakeSource {
+ row.num = 10
+ map.size = 10
+ array.size = 10
+ bytes.length = 10
+ string.length = 10
+ schema = {
+ fields {
+ c_map = "map<string, array<int>>"
+ c_array = "array<int>"
+ c_string = string
+ c_boolean = boolean
+ c_tinyint = tinyint
+ c_smallint = smallint
+ c_int = int
+ c_bigint = bigint
+ c_float = float
+ c_double = double
+ c_decimal = "decimal(16, 1)"
+ c_null = "null"
+ c_bytes = bytes
+ c_date = date
+ c_timestamp = timestamp
+ }
+ }
+ }
+}
-https://doris.apache.org/docs/dev/data-operate/update-delete/batch-delete-manual
+sink {
+ Doris {
+ fenodes = "doris_cdc_e2e:8030"
+ username = root
+ password = ""
+ table.identifier = "test.e2e_table_sink"
+ sink.label-prefix = "test-cdc"
+ sink.enable-2pc = "true"
+ sink.enable-delete = "true"
+ doris.config {
+ format = "json"
+ read_json_by_line = "true"
+ }
+ }
+}
+```
-### doris.config [map]
+### CDC(Change Data Capture) Event:
-The parameter of the stream load `data_desc`, you can get more detail at this
link:
+> This example defines a SeaTunnel synchronization task that automatically
generates data through FakeSource and sends it to Doris Sink,FakeSource
simulates CDC data with schema, score (int type),Doris needs to create a table
sink named test.e2e_table_sink and a corresponding table for it.
-https://doris.apache.org/docs/dev/sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD
+```hocon
+env {
+ parallelism = 1
+ job.mode = "BATCH"
+ checkpoint.interval = 10000
+}
-#### Supported import data formats
+source {
+ FakeSource {
+ schema = {
+ fields {
+ pk_id = bigint
+ name = string
+ score = int
+ sex = boolean
+ number = tinyint
+ height = float
+ sight = double
+ create_time = date
+ update_time = timestamp
+ }
+ }
+ rows = [
+ {
+ kind = INSERT
+ fields = [1, "A", 100, true, 1, 170.0, 4.3, "2020-02-02",
"2020-02-02T02:02:02"]
+ },
+ {
+ kind = INSERT
+ fields = [2, "B", 100, true, 1, 170.0, 4.3, "2020-02-02",
"2020-02-02T02:02:02"]
+ },
+ {
+ kind = INSERT
+ fields = [3, "C", 100, true, 1, 170.0, 4.3, "2020-02-02",
"2020-02-02T02:02:02"]
+ },
+ {
+ kind = UPDATE_BEFORE
+ fields = [1, "A", 100, true, 1, 170.0, 4.3, "2020-02-02",
"2020-02-02T02:02:02"]
+ },
+ {
+ kind = UPDATE_AFTER
+ fields = [1, "A_1", 100, true, 1, 170.0, 4.3, "2020-02-02",
"2020-02-02T02:02:02"]
+ },
+ {
+ kind = DELETE
+ fields = [2, "B", 100, true, 1, 170.0, 4.3, "2020-02-02",
"2020-02-02T02:02:02"]
+ }
+ ]
+ }
+}
-The supported formats include CSV and JSON. Default value: CSV
+sink {
+ Doris {
+ fenodes = "doris_cdc_e2e:8030"
+ username = root
+ password = ""
+ table.identifier = "test.e2e_table_sink"
+ sink.label-prefix = "test-cdc"
+ sink.enable-2pc = "true"
+ sink.enable-delete = "true"
+ doris.config {
+ format = "json"
+ read_json_by_line = "true"
+ }
+ }
+}
-## Example
+```
-Use JSON format to import data
+### Use JSON format to import data
```
sink {
@@ -97,7 +229,7 @@ sink {
```
-Use CSV format to import data
+### Use CSV format to import data
```
sink {