EricJoy2048 commented on code in PR #5085: URL: https://github.com/apache/seatunnel/pull/5085#discussion_r1266153642
########## docs/en/connector-v2/sink/Clickhouse.md: ########## @@ -2,91 +2,49 @@ > Clickhouse sink connector -## Description +## Support Those Engines -Used to write data to Clickhouse. +> Spark<br/> +> Flink<br/> +> SeaTunnel Zeta<br/> -## Key features +## Key Features - [ ] [exactly-once](../../concept/connector-v2-features.md) - -The Clickhouse sink plug-in can achieve accuracy once by implementing idempotent writing, and needs to cooperate with aggregatingmergetree and other engines that support deduplication. - - [x] [cdc](../../concept/connector-v2-features.md) -## Options - -| name | type | required | default value | -|---------------------------------------|---------|----------|---------------| -| host | string | yes | - | -| database | string | yes | - | -| table | string | yes | - | -| username | string | yes | - | -| password | string | yes | - | -| clickhouse.config | map | no | | -| bulk_size | string | no | 20000 | -| split_mode | string | no | false | -| sharding_key | string | no | - | -| primary_key | string | no | - | -| support_upsert | boolean | no | false | -| allow_experimental_lightweight_delete | boolean | no | false | -| common-options | | no | - | - -### host [string] - -`ClickHouse` cluster address, the format is `host:port` , allowing multiple `hosts` to be specified. Such as `"host1:8123,host2:8123"` . - -### database [string] - -The `ClickHouse` database - -### table [string] - -The table name - -### username [string] - -`ClickHouse` user username - -### password [string] - -`ClickHouse` user password +> The Clickhouse sink plug-in can achieve accuracy once by implementing idempotent writing, and needs to cooperate with aggregatingmergetree and other engines that support deduplication. -### clickhouse.config [map] - -In addition to the above mandatory parameters that must be specified by `clickhouse-jdbc` , users can also specify multiple optional parameters, which cover all the [parameters](https://github.com/ClickHouse/clickhouse-jdbc/tree/master/clickhouse-client#configuration) provided by `clickhouse-jdbc` . - -### bulk_size [number] - -The number of rows written through [Clickhouse-jdbc](https://github.com/ClickHouse/clickhouse-jdbc) each time, the `default is 20000`, if checkpoints are enabled, writing will also occur at the times when the checkpoints are satisfied . - -### split_mode [boolean] - -This mode only support clickhouse table which engine is 'Distributed'.And `internal_replication` option -should be `true`. They will split distributed table data in seatunnel and perform write directly on each shard. The shard weight define is clickhouse will be -counted. - -### sharding_key [string] - -When use split_mode, which node to send data to is a problem, the default is random selection, but the -'sharding_key' parameter can be used to specify the field for the sharding algorithm. This option only -worked when 'split_mode' is true. - -### primary_key [string] - -Mark the primary key column from clickhouse table, and based on primary key execute INSERT/UPDATE/DELETE to clickhouse table - -### support_upsert [boolean] - -Support upsert row by query primary key - -### allow_experimental_lightweight_delete [boolean] - -Allow experimental lightweight delete based on `*MergeTree` table engine +## Description -### common options +Used to write data to Clickhouse. -Sink plugin common parameters, please refer to [Sink Common Options](common-options.md) for details +## Supported DataSource Info + +In order to use the Clickhouse connector, the following dependencies are required. +They can be downloaded via install-plugin.sh or from the Maven central repository. + +| Datasource | Supported Versions | Dependency | +|------------|--------------------|------------------------------------------------------------------------------------------------------------------| +| Clickhouse | universal | [Download](https://mvnrepository.com/artifact/org.apache.seatunnel/seatunnel-connectors-v2/connector-clickhouse) | + +## Sink Options Review Comment: You lost **Data Type Mapper**? ########## docs/en/connector-v2/source/Clickhouse.md: ########## @@ -2,61 +2,66 @@ > Clickhouse source connector -## Description +## Support Those Engines -Used to read data from Clickhouse. +> Spark<br/> +> Flink<br/> +> SeaTunnel Zeta<br/> -## Key features +## Key Features - [x] [batch](../../concept/connector-v2-features.md) - [ ] [stream](../../concept/connector-v2-features.md) - [ ] [exactly-once](../../concept/connector-v2-features.md) - [x] [column projection](../../concept/connector-v2-features.md) - -supports query SQL and can achieve projection effect. - - [ ] [parallelism](../../concept/connector-v2-features.md) - [ ] [support user-defined split](../../concept/connector-v2-features.md) -## Options - -| name | type | required | default value | -|------------------|--------|----------|------------------------| -| host | string | yes | - | -| database | string | yes | - | -| sql | string | yes | - | -| username | string | yes | - | -| password | string | yes | - | -| server_time_zone | string | no | ZoneId.systemDefault() | -| common-options | | no | - | - -### host [string] - -`ClickHouse` cluster address, the format is `host:port` , allowing multiple `hosts` to be specified. Such as `"host1:8123,host2:8123"` . - -### database [string] - -The `ClickHouse` database - -### sql [string] +> supports query SQL and can achieve projection effect. -The query sql used to search data though Clickhouse server - -### username [string] - -`ClickHouse` user username - -### password [string] - -`ClickHouse` user password - -### server_time_zone [string] - -The session time zone in database server. If not set, then ZoneId.systemDefault() is used to determine the server time zone. +## Description -### common options +Used to read data from Clickhouse. -Source plugin common parameters, please refer to [Source Common Options](common-options.md) for details +## Supported DataSource Info + +In order to use the Clickhouse connector, the following dependencies are required. +They can be downloaded via install-plugin.sh or from the Maven central repository. + +| Datasource | Supported Versions | Dependency | +|------------|--------------------|------------------------------------------------------------------------------------------------------------------| +| Clickhouse | universal | [Download](https://mvnrepository.com/artifact/org.apache.seatunnel/seatunnel-connectors-v2/connector-clickhouse) | + +## Data Type Mapping + +| Clickhouse Data type | SeaTunnel Data type | +|--------------------------------------------------------|---------------------| +| String / IP / UUID /Enum | STRING | +| UInt8 | BOOLEAN | +| FixedString | BINARY | +| Int32 / UInt16 / Interval | INTEGER | +| Int8 | TINYINT | +| Int64 | BIGINT | +| Int16 / UInt8 | SMALLINT | +| Float64 | DOUBLE | +| Decimal / Int128 / Int256 / UInt64 / UInt128 / UInt256 | DECIMAL | +| Float32 | FLOAT | +| Date | Date | +| Timestamp | Timestamp | +| DateTime | Time | +| Array | ARRAY | + +## Source Options + +| Name | Type | Required | Default | Description | +|------------------|--------|----------|------------------------|------------------------------------------------------------------------------------------------------------------------------------------| +| host | String | Yes | - | `ClickHouse` cluster address, the format is `host:port` , allowing multiple `hosts` to be specified. Such as `"host1:8123,host2:8123"` . | +| database | String | Yes | - | The `ClickHouse` database | +| sql | String | Yes | - | The query sql used to search data though Clickhouse server | +| username | String | Yes | - | `ClickHouse` user username | +| password | String | Yes | - | `ClickHouse` user password | +| server_time_zone | String | No | ZoneId.systemDefault() | The session time zone in database server. If not set, then ZoneId.systemDefault() is used to determine the server time zone. | +| common-options | | No | - | Source plugin common parameters, please refer to [Source Common Options](common-options.md) for details | ## Examples Review Comment: Same as above. ########## docs/en/connector-v2/sink/Clickhouse.md: ########## @@ -2,91 +2,49 @@ > Clickhouse sink connector -## Description +## Support Those Engines -Used to write data to Clickhouse. +> Spark<br/> +> Flink<br/> +> SeaTunnel Zeta<br/> -## Key features +## Key Features - [ ] [exactly-once](../../concept/connector-v2-features.md) - -The Clickhouse sink plug-in can achieve accuracy once by implementing idempotent writing, and needs to cooperate with aggregatingmergetree and other engines that support deduplication. - - [x] [cdc](../../concept/connector-v2-features.md) -## Options - -| name | type | required | default value | -|---------------------------------------|---------|----------|---------------| -| host | string | yes | - | -| database | string | yes | - | -| table | string | yes | - | -| username | string | yes | - | -| password | string | yes | - | -| clickhouse.config | map | no | | -| bulk_size | string | no | 20000 | -| split_mode | string | no | false | -| sharding_key | string | no | - | -| primary_key | string | no | - | -| support_upsert | boolean | no | false | -| allow_experimental_lightweight_delete | boolean | no | false | -| common-options | | no | - | - -### host [string] - -`ClickHouse` cluster address, the format is `host:port` , allowing multiple `hosts` to be specified. Such as `"host1:8123,host2:8123"` . - -### database [string] - -The `ClickHouse` database - -### table [string] - -The table name - -### username [string] - -`ClickHouse` user username - -### password [string] - -`ClickHouse` user password +> The Clickhouse sink plug-in can achieve accuracy once by implementing idempotent writing, and needs to cooperate with aggregatingmergetree and other engines that support deduplication. -### clickhouse.config [map] - -In addition to the above mandatory parameters that must be specified by `clickhouse-jdbc` , users can also specify multiple optional parameters, which cover all the [parameters](https://github.com/ClickHouse/clickhouse-jdbc/tree/master/clickhouse-client#configuration) provided by `clickhouse-jdbc` . - -### bulk_size [number] - -The number of rows written through [Clickhouse-jdbc](https://github.com/ClickHouse/clickhouse-jdbc) each time, the `default is 20000`, if checkpoints are enabled, writing will also occur at the times when the checkpoints are satisfied . - -### split_mode [boolean] - -This mode only support clickhouse table which engine is 'Distributed'.And `internal_replication` option -should be `true`. They will split distributed table data in seatunnel and perform write directly on each shard. The shard weight define is clickhouse will be -counted. - -### sharding_key [string] - -When use split_mode, which node to send data to is a problem, the default is random selection, but the -'sharding_key' parameter can be used to specify the field for the sharding algorithm. This option only -worked when 'split_mode' is true. - -### primary_key [string] - -Mark the primary key column from clickhouse table, and based on primary key execute INSERT/UPDATE/DELETE to clickhouse table - -### support_upsert [boolean] - -Support upsert row by query primary key - -### allow_experimental_lightweight_delete [boolean] - -Allow experimental lightweight delete based on `*MergeTree` table engine +## Description -### common options +Used to write data to Clickhouse. -Sink plugin common parameters, please refer to [Sink Common Options](common-options.md) for details +## Supported DataSource Info + +In order to use the Clickhouse connector, the following dependencies are required. +They can be downloaded via install-plugin.sh or from the Maven central repository. + +| Datasource | Supported Versions | Dependency | +|------------|--------------------|------------------------------------------------------------------------------------------------------------------| +| Clickhouse | universal | [Download](https://mvnrepository.com/artifact/org.apache.seatunnel/seatunnel-connectors-v2/connector-clickhouse) | + +## Sink Options + +| Name | Type | Required | Default | Description | +|---------------------------------------|---------|----------|---------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| host | String | Yes | - | `ClickHouse` cluster address, the format is `host:port` , allowing multiple `hosts` to be specified. Such as `"host1:8123,host2:8123"`. | +| database | String | Yes | - | The `ClickHouse` database. | +| table | String | Yes | - | The table name. | +| username | String | Yes | - | `ClickHouse` user username. | +| password | String | Yes | - | `ClickHouse` user password. | +| clickhouse.config | Map | No | | In addition to the above mandatory parameters that must be specified by `clickhouse-jdbc` , users can also specify multiple optional parameters, which cover all the [parameters](https://github.com/ClickHouse/clickhouse-jdbc/tree/master/clickhouse-client#configuration) provided by `clickhouse-jdbc`. | +| bulk_size | String | No | 20000 | The number of rows written through [Clickhouse-jdbc](https://github.com/ClickHouse/clickhouse-jdbc) each time, the `default is 20000`. | +| split_mode | String | No | false | This mode only support clickhouse table which engine is 'Distributed'.And `internal_replication` option-should be `true`.They will split distributed table data in seatunnel and perform write directly on each shard. The shard weight define is clickhouse will counted. | +| sharding_key | String | No | - | When use split_mode, which node to send data to is a problem, the default is random selection, but the 'sharding_key' parameter can be used to specify the field for the sharding algorithm. This option only worked when 'split_mode' is true. | +| primary_key | String | No | - | Mark the primary key column from clickhouse table, and based on primary key execute INSERT/UPDATE/DELETE to clickhouse table. | +| support_upsert | Boolean | No | false | Support upsert row by query primary key. | +| allow_experimental_lightweight_delete | Boolean | No | false | Allow experimental lightweight delete based on `*MergeTree` table engine. | +| common-options | | No | - | Sink plugin common parameters, please refer to [Sink Common Options](common-options.md) for details. | ## Examples Review Comment: The first example need contains `env` and `source` `transform`. And add a link to let you know how to install SeaTunnel and run this example. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
