This is an automated email from the ASF dual-hosted git repository.
fanjia pushed a commit to branch dev
in repository https://gitbox.apache.org/repos/asf/seatunnel.git
The following commit(s) were added to refs/heads/dev by this push:
new 70ec3a360 [Improve][Docs][Clickhouse] Reconstruct the clickhouse
connector doc (#5085)
70ec3a360 is described below
commit 70ec3a3608d91359c89cbb9c9da89b6f041ef7d7
Author: monster <[email protected]>
AuthorDate: Wed Jul 19 17:25:45 2023 +0800
[Improve][Docs][Clickhouse] Reconstruct the clickhouse connector doc (#5085)
---------
Co-authored-by: chenzy15 <[email protected]>
---
docs/en/connector-v2/sink/Clickhouse.md | 207 +++++++++++++++---------------
docs/en/connector-v2/source/Clickhouse.md | 129 ++++++++++---------
2 files changed, 168 insertions(+), 168 deletions(-)
diff --git a/docs/en/connector-v2/sink/Clickhouse.md
b/docs/en/connector-v2/sink/Clickhouse.md
index 7c4bab991..27bf274c7 100644
--- a/docs/en/connector-v2/sink/Clickhouse.md
+++ b/docs/en/connector-v2/sink/Clickhouse.md
@@ -2,95 +2,110 @@
> Clickhouse sink connector
-## Description
+## Support Those Engines
-Used to write data to Clickhouse.
+> Spark<br/>
+> Flink<br/>
+> SeaTunnel Zeta<br/>
-## Key features
+## Key Features
- [ ] [exactly-once](../../concept/connector-v2-features.md)
-
-The Clickhouse sink plug-in can achieve accuracy once by implementing
idempotent writing, and needs to cooperate with aggregatingmergetree and other
engines that support deduplication.
-
- [x] [cdc](../../concept/connector-v2-features.md)
-## Options
-
-| name | type | required | default value |
-|---------------------------------------|---------|----------|---------------|
-| host | string | yes | - |
-| database | string | yes | - |
-| table | string | yes | - |
-| username | string | yes | - |
-| password | string | yes | - |
-| clickhouse.config | map | no | |
-| bulk_size | string | no | 20000 |
-| split_mode | string | no | false |
-| sharding_key | string | no | - |
-| primary_key | string | no | - |
-| support_upsert | boolean | no | false |
-| allow_experimental_lightweight_delete | boolean | no | false |
-| common-options | | no | - |
-
-### host [string]
-
-`ClickHouse` cluster address, the format is `host:port` , allowing multiple
`hosts` to be specified. Such as `"host1:8123,host2:8123"` .
-
-### database [string]
-
-The `ClickHouse` database
-
-### table [string]
-
-The table name
-
-### username [string]
-
-`ClickHouse` user username
-
-### password [string]
-
-`ClickHouse` user password
-
-### clickhouse.config [map]
-
-In addition to the above mandatory parameters that must be specified by
`clickhouse-jdbc` , users can also specify multiple optional parameters, which
cover all the
[parameters](https://github.com/ClickHouse/clickhouse-jdbc/tree/master/clickhouse-client#configuration)
provided by `clickhouse-jdbc` .
-
-### bulk_size [number]
-
-The number of rows written through
[Clickhouse-jdbc](https://github.com/ClickHouse/clickhouse-jdbc) each time, the
`default is 20000`, if checkpoints are enabled, writing will also occur at the
times when the checkpoints are satisfied .
-
-### split_mode [boolean]
-
-This mode only support clickhouse table which engine is 'Distributed'.And
`internal_replication` option
-should be `true`. They will split distributed table data in seatunnel and
perform write directly on each shard. The shard weight define is clickhouse
will be
-counted.
-
-### sharding_key [string]
+> The Clickhouse sink plug-in can achieve accuracy once by implementing
idempotent writing, and needs to cooperate with aggregatingmergetree and other
engines that support deduplication.
-When use split_mode, which node to send data to is a problem, the default is
random selection, but the
-'sharding_key' parameter can be used to specify the field for the sharding
algorithm. This option only
-worked when 'split_mode' is true.
-
-### primary_key [string]
-
-Mark the primary key column from clickhouse table, and based on primary key
execute INSERT/UPDATE/DELETE to clickhouse table
-
-### support_upsert [boolean]
+## Description
-Support upsert row by query primary key
+Used to write data to Clickhouse.
-### allow_experimental_lightweight_delete [boolean]
+## Supported DataSource Info
+
+In order to use the Clickhouse connector, the following dependencies are
required.
+They can be downloaded via install-plugin.sh or from the Maven central
repository.
+
+| Datasource | Supported Versions |
Dependency |
+|------------|--------------------|------------------------------------------------------------------------------------------------------------------|
+| Clickhouse | universal |
[Download](https://mvnrepository.com/artifact/org.apache.seatunnel/seatunnel-connectors-v2/connector-clickhouse)
|
+
+## Data Type Mapping
+
+| SeaTunnel Data type |
Clickhouse Data type
|
+|---------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|
+| STRING | String / Int128 / UInt128 / Int256 / UInt256 / Point /
Ring / Polygon MultiPolygon
|
+| INT | Int8 / UInt8 / Int16 / UInt16 / Int32
|
+| BIGINT | UInt64 / Int64 / IntervalYear / IntervalQuarter /
IntervalMonth / IntervalWeek / IntervalDay / IntervalHour / IntervalMinute /
IntervalSecond |
+| DOUBLE | Float64
|
+| DECIMAL | Decimal
|
+| FLOAT | Float32
|
+| DATE | Date
|
+| TIME | DateTime
|
+| ARRAY | Array
|
+| MAP | Map
|
+
+## Sink Options
+
+| Name | Type | Required | Default |
Description
|
+|---------------------------------------|---------|----------|---------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| host | String | Yes | - |
`ClickHouse` cluster address, the format is `host:port` , allowing multiple
`hosts` to be specified. Such as `"host1:8123,host2:8123"`.
|
+| database | String | Yes | - | The
`ClickHouse` database.
|
+| table | String | Yes | - | The
table name.
|
+| username | String | Yes | - |
`ClickHouse` user username.
|
+| password | String | Yes | - |
`ClickHouse` user password.
|
+| clickhouse.config | Map | No | | In
addition to the above mandatory parameters that must be specified by
`clickhouse-jdbc` , users can also specify multiple optional parameters, which
cover all the
[parameters](https://github.com/ClickHouse/clickhouse-jdbc/tree/master/clickhouse-client#configuration)
provided by `clickhouse-jdbc`. |
+| bulk_size | String | No | 20000 | The
number of rows written through
[Clickhouse-jdbc](https://github.com/ClickHouse/clickhouse-jdbc) each time, the
`default is 20000`.
|
+| split_mode | String | No | false | This
mode only support clickhouse table which engine is 'Distributed'.And
`internal_replication` option-should be `true`.They will split distributed
table data in seatunnel and perform write directly on each shard. The shard
weight define is clickhouse will counted. |
+| sharding_key | String | No | - | When
use split_mode, which node to send data to is a problem, the default is random
selection, but the 'sharding_key' parameter can be used to specify the field
for the sharding algorithm. This option only worked when 'split_mode' is true.
|
+| primary_key | String | No | - | Mark
the primary key column from clickhouse table, and based on primary key execute
INSERT/UPDATE/DELETE to clickhouse table.
|
+| support_upsert | Boolean | No | false |
Support upsert row by query primary key.
|
+| allow_experimental_lightweight_delete | Boolean | No | false | Allow
experimental lightweight delete based on `*MergeTree` table engine.
|
+| common-options | | No | - | Sink
plugin common parameters, please refer to [Sink Common
Options](common-options.md) for details.
|
+
+## How to Create a Clickhouse Data Synchronization Jobs
+
+The following example demonstrates how to create a data synchronization job
that writes randomly generated data to a Clickhouse database:
+
+```bash
+# Set the basic configuration of the task to be performed
+env {
+ execution.parallelism = 1
+ job.mode = "BATCH"
+ checkpoint.interval = 1000
+}
-Allow experimental lightweight delete based on `*MergeTree` table engine
+source {
+ FakeSource {
+ row.num = 2
+ bigint.min = 0
+ bigint.max = 10000000
+ split.num = 1
+ split.read-interval = 300
+ schema {
+ fields {
+ c_bigint = bigint
+ }
+ }
+ }
+}
-### common options
+sink {
+ Clickhouse {
+ host = "127.0.0.1:9092"
+ database = "default"
+ table = "test"
+ username = "xxxxx"
+ password = "xxxxx"
+ }
+}
+```
-Sink plugin common parameters, please refer to [Sink Common
Options](common-options.md) for details
+### Tips
-## Examples
+> 1.[SeaTunnel Deployment Document](../../start-v2/locally/deployment.md).
<br/>
+> 2.The table to be written to needs to be created in advance before
synchronization.<br/>
+> 3.When sink is writing to the ClickHouse table, you don't need to set its
schema because the connector will query ClickHouse for the current table's
schema information before writing.<br/>
-Simple
+## Clickhouse Sink Config
```hocon
sink {
@@ -98,9 +113,9 @@ sink {
host = "localhost:8123"
database = "default"
table = "fake_all"
- username = "default"
- password = ""
- clickhouse.confg = {
+ username = "xxxxx"
+ password = "xxxxx"
+ clickhouse.config = {
max_rows_to_read = "100"
read_overflow_mode = "throw"
}
@@ -108,7 +123,7 @@ sink {
}
```
-Split mode
+## Split Mode
```hocon
sink {
@@ -116,8 +131,8 @@ sink {
host = "localhost:8123"
database = "default"
table = "fake_all"
- username = "default"
- password = ""
+ username = "xxxxx"
+ password = "xxxxx"
# split mode options
split_mode = true
@@ -126,7 +141,7 @@ sink {
}
```
-CDC(Change data capture)
+## CDC(Change data capture) Sink
```hocon
sink {
@@ -134,8 +149,8 @@ sink {
host = "localhost:8123"
database = "default"
table = "fake_all"
- username = "default"
- password = ""
+ username = "xxxxx"
+ password = "xxxxx"
# cdc options
primary_key = "id"
@@ -144,7 +159,7 @@ sink {
}
```
-CDC(Change data capture) for *MergeTree engine
+## CDC(Change data capture) for *MergeTree engine
```hocon
sink {
@@ -152,8 +167,8 @@ sink {
host = "localhost:8123"
database = "default"
table = "fake_all"
- username = "default"
- password = ""
+ username = "xxxxx"
+ password = "xxxxx"
# cdc options
primary_key = "id"
@@ -163,21 +178,3 @@ sink {
}
```
-## Changelog
-
-### 2.2.0-beta 2022-09-26
-
-- Add ClickHouse Sink Connector
-
-### 2.3.0-beta 2022-10-20
-
-- [Improve] Clickhouse Support Int128,Int256 Type
([3067](https://github.com/apache/seatunnel/pull/3067))
-
-### next version
-
-- [Improve] Clickhouse Sink support nest type and array
type([3047](https://github.com/apache/seatunnel/pull/3047))
-- [Improve] Clickhouse Sink support geo
type([3141](https://github.com/apache/seatunnel/pull/3141))
-- [Feature] Support CDC write DELETE/UPDATE/INSERT events
([3653](https://github.com/apache/seatunnel/pull/3653))
-- [Improve] Remove Clickhouse Fields Config
([3826](https://github.com/apache/seatunnel/pull/3826))
-- [Improve] Change Connector Custom Config Prefix To Map
[3719](https://github.com/apache/seatunnel/pull/3719)
-
diff --git a/docs/en/connector-v2/source/Clickhouse.md
b/docs/en/connector-v2/source/Clickhouse.md
index 07384875c..7596bf72a 100644
--- a/docs/en/connector-v2/source/Clickhouse.md
+++ b/docs/en/connector-v2/source/Clickhouse.md
@@ -2,93 +2,96 @@
> Clickhouse source connector
-## Description
+## Support Those Engines
-Used to read data from Clickhouse.
+> Spark<br/>
+> Flink<br/>
+> SeaTunnel Zeta<br/>
-## Key features
+## Key Features
- [x] [batch](../../concept/connector-v2-features.md)
- [ ] [stream](../../concept/connector-v2-features.md)
- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [x] [column projection](../../concept/connector-v2-features.md)
-
-supports query SQL and can achieve projection effect.
-
- [ ] [parallelism](../../concept/connector-v2-features.md)
- [ ] [support user-defined split](../../concept/connector-v2-features.md)
-## Options
-
-| name | type | required | default value |
-|------------------|--------|----------|------------------------|
-| host | string | yes | - |
-| database | string | yes | - |
-| sql | string | yes | - |
-| username | string | yes | - |
-| password | string | yes | - |
-| server_time_zone | string | no | ZoneId.systemDefault() |
-| common-options | | no | - |
-
-### host [string]
-
-`ClickHouse` cluster address, the format is `host:port` , allowing multiple
`hosts` to be specified. Such as `"host1:8123,host2:8123"` .
-
-### database [string]
-
-The `ClickHouse` database
-
-### sql [string]
-
-The query sql used to search data though Clickhouse server
-
-### username [string]
-
-`ClickHouse` user username
-
-### password [string]
-
-`ClickHouse` user password
+> supports query SQL and can achieve projection effect.
-### server_time_zone [string]
-
-The session time zone in database server. If not set, then
ZoneId.systemDefault() is used to determine the server time zone.
-
-### common options
+## Description
-Source plugin common parameters, please refer to [Source Common
Options](common-options.md) for details
+Used to read data from Clickhouse.
-## Examples
+## Supported DataSource Info
+
+In order to use the Clickhouse connector, the following dependencies are
required.
+They can be downloaded via install-plugin.sh or from the Maven central
repository.
+
+| Datasource | Supported Versions |
Dependency |
+|------------|--------------------|------------------------------------------------------------------------------------------------------------------|
+| Clickhouse | universal |
[Download](https://mvnrepository.com/artifact/org.apache.seatunnel/seatunnel-connectors-v2/connector-clickhouse)
|
+
+## Data Type Mapping
+
+| Clickhouse Data
type | SeaTunnel
Data type |
+|-----------------------------------------------------------------------------------------------------------------------------------------------|---------------------|
+| String / Int128 / UInt128 / Int256 / UInt256 / Point / Ring / Polygon
MultiPolygon |
STRING |
+| Int8 / UInt8 / Int16 / UInt16 / Int32
| INT
|
+| UInt64 / Int64 / IntervalYear / IntervalQuarter / IntervalMonth /
IntervalWeek / IntervalDay / IntervalHour / IntervalMinute / IntervalSecond |
BIGINT |
+| Float64
| DOUBLE
|
+| Decimal
| DECIMAL
|
+| Float32
| FLOAT
|
+| Date
| DATE
|
+| DateTime
| TIME
|
+| Array
| ARRAY
|
+| Map
| MAP
|
+
+## Source Options
+
+| Name | Type | Required | Default |
Description
|
+|------------------|--------|----------|------------------------|------------------------------------------------------------------------------------------------------------------------------------------|
+| host | String | Yes | - | `ClickHouse`
cluster address, the format is `host:port` , allowing multiple `hosts` to be
specified. Such as `"host1:8123,host2:8123"` . |
+| database | String | Yes | - | The
`ClickHouse` database.
|
+| sql | String | Yes | - | The query
sql used to search data though Clickhouse server.
|
+| username | String | Yes | - | `ClickHouse`
user username.
|
+| password | String | Yes | - | `ClickHouse`
user password.
|
+| server_time_zone | String | No | ZoneId.systemDefault() | The session
time zone in database server. If not set, then ZoneId.systemDefault() is used
to determine the server time zone. |
+| common-options | | No | - | Source
plugin common parameters, please refer to [Source Common
Options](common-options.md) for details. |
+
+## How to Create a Clickhouse Data Synchronization Jobs
+
+The following example demonstrates how to create a data synchronization job
that reads data from Clickhouse and prints it on the local client:
+
+```bash
+# Set the basic configuration of the task to be performed
+env {
+ execution.parallelism = 1
+ job.mode = "BATCH"
+}
-```hocon
+# Create a source to connect to Clickhouse
source {
-
Clickhouse {
host = "localhost:8123"
database = "default"
sql = "select * from test where age = 20 limit 100"
- username = "default"
- password = ""
+ username = "xxxxx"
+ password = "xxxxx"
server_time_zone = "UTC"
result_table_name = "test"
}
-
}
-```
-
-## Changelog
-### 2.2.0-beta 2022-09-26
-
-- Add ClickHouse Source Connector
-
-### 2.3.0-beta 2022-10-20
-
-- [Improve] Clickhouse Source random use host when config multi-host
([3108](https://github.com/apache/seatunnel/pull/3108))
-
-### next version
+# Console printing of the read Clickhouse data
+sink {
+ Console {
+ parallelism = 1
+ }
+}
+```
-- [Improve] Clickhouse Source support nest type and array
type([3047](https://github.com/apache/seatunnel/pull/3047))
+### Tips
-- [Improve] Clickhouse Source support geo
type([3141](https://github.com/apache/seatunnel/pull/3141))
+> 1.[SeaTunnel Deployment Document](../../start-v2/locally/deployment.md).