[ 
https://issues.apache.org/jira/browse/BEAM-5964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16683721#comment-16683721
 ] 

Gleb Kanterov commented on BEAM-5964:
-------------------------------------

[~iemejia] thanks for feedback! In my experience, plain jdbc part of 
clickhouse-jdbc is not ideal because it hides the important internals of 
ClickHouse behind generic JDBC abstraction. I took the path of using binary 
protocol, that is much more efficient and has the full support of all features. 
I'm wrapping up the pull request and going to ask the author of clickhouse-jdbc 
to be one of the reviewers.

Regarding, SDF, if I understand correctly it makes sense only for reading, but 
not for writing, is it right?

> Add ClickHouseIO.Write
> ----------------------
>
>                 Key: BEAM-5964
>                 URL: https://issues.apache.org/jira/browse/BEAM-5964
>             Project: Beam
>          Issue Type: New Feature
>          Components: io-ideas
>            Reporter: Gleb Kanterov
>            Assignee: Gleb Kanterov
>            Priority: Major
>
> h3. Motivation
> ClickHouse is open-source columnar DBMS for OLAP. It allows analysis of data 
> that is updated in real time. The project was released as open-source 
> software under the Apache 2 license in June 2016.
> h3. Design and implementation
> 1. Do only writes, reads aren't useful because ClickHouse is designed for 
> OLAP queries
> 2. For writes, do write in batches and rely on idempotent and atomic inserts 
> supported by replicated tables in ClickHouse
> 3. Implement ClickHouseIO.Write as PTransform<PCollection<Row>, PDone>
> 4. Rely on having logic for casting rows between schemas in BEAM-5918, and 
> don't put it in ClickHouseIO.Write
> h3. References
> [1] 
> http://highscalability.com/blog/2017/9/18/evolution-of-data-structures-in-yandexmetrica.html
> [2] 
> https://blog.cloudflare.com/how-cloudflare-analyzes-1m-dns-queries-per-second/
> [3] 
> https://blog.cloudflare.com/http-analytics-for-6m-requests-per-second-using-clickhouse/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to