[GitHub] [incubator-seatunnel] BenJFan opened a new issue #1382: [Feature][Connector] Add clickhouse-file sink support clickhouse bullk load

GitBox Thu, 03 Mar 2022 01:45:42 -0800


BenJFan opened a new issue #1382:
URL: https://github.com/apache/incubator-seatunnel/issues/1382



   ### Search before asking
   
   - [X] I had searched in the 
[feature](https://github.com/apache/incubator-seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22)
 and found no similar feature requirement.
   
   
   ### Description
   
   ## Summary
   In the scenario where massive data is written to Clickhouse, traditional 
jdbc cannot carry such a large amount of data. Similar to hbase's bulk load 
function, seatunnel can provide support for clickhouse to directly write data 
files.
   ## Plan
   This is original plan: https://github.com/ClickHouse/ClickHouse/issues/10473
   Ours plan:
   
![Clickhouse_bulk_load](https://user-images.githubusercontent.com/32387433/156533534-48dd0124-cb53-4ff2-b5a4-966ec6831283.png)
   #### Details:
   1. create table use clickhouse-local 
   2. receive data from upstream
   3. execute insert sql with data into clickhouse-local table
   4. use zero copy send data file to clickhouse server, path is : 
'${clickhouse_data_location}/${database}/${table}/detached'
   5. use clickhouse [Attacth 
statements](https://clickhouse.com/docs/zh/sql-reference/statements/attach/) 
make data file can query
   ## Options
   | name           | type   | required | default value |
   | -------------- | ------ | -------- | ------------- |
   | bulk_size      | number | no       | 100000        |
   | database       | string | yes      | -             |
   | clickhouse_local_path  | string | yes      | -             |
   | fields         | array  | no       | -             |
   | host           | string | yes      | -             |
   | password       | string | no       | -             |
   | table          | string | yes      | -             |
   | username       | string | no       | -             |
   | common-options | string | no       | -             |
   ## Some problem
   1. clickhouse-local program should be installed in every spark node with 
same path before seatunnel application start.
   2. not all engine can work great, we make MergeTree series and Distribute 
work fine first.
   
   ### Usage Scenario
   
   _No response_
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-seatunnel] BenJFan opened a new issue #1382: [Feature][Connector] Add clickhouse-file sink support clickhouse bullk load

Reply via email to