[ 
https://issues.apache.org/jira/browse/BEAM-5964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-5964 started by Gleb Kanterov.
-------------------------------------------
> Add ClickHouseIO.Write
> ----------------------
>
>                 Key: BEAM-5964
>                 URL: https://issues.apache.org/jira/browse/BEAM-5964
>             Project: Beam
>          Issue Type: New Feature
>          Components: io-ideas
>            Reporter: Gleb Kanterov
>            Assignee: Gleb Kanterov
>            Priority: Major
>          Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> h3. Motivation
> ClickHouse is open-source columnar DBMS for OLAP. It allows analysis of data 
> that is updated in real time. The project was released as open-source 
> software under the Apache 2 license in June 2016.
> h3. Design and implementation
> 1. Do only writes, reads aren't useful because ClickHouse is designed for 
> OLAP queries
> 2. For writes, do write in batches and rely on idempotent and atomic inserts 
> supported by replicated tables in ClickHouse
> 3. Implement ClickHouseIO.Write as PTransform<PCollection<Row>, PDone>
> 4. Rely on having logic for casting rows between schemas in BEAM-5918, and 
> don't put it in ClickHouseIO.Write
> h3. References
> [1] 
> http://highscalability.com/blog/2017/9/18/evolution-of-data-structures-in-yandexmetrica.html
> [2] 
> https://blog.cloudflare.com/how-cloudflare-analyzes-1m-dns-queries-per-second/
> [3] 
> https://blog.cloudflare.com/http-analytics-for-6m-requests-per-second-using-clickhouse/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to