Gleb Kanterov created BEAM-5964:
-----------------------------------

             Summary: Add ClickHouseIO.Write
                 Key: BEAM-5964
                 URL: https://issues.apache.org/jira/browse/BEAM-5964
             Project: Beam
          Issue Type: New Feature
          Components: io-ideas
            Reporter: Gleb Kanterov
            Assignee: Eugene Kirpichov


h3. Motivation

ClickHouse is open-source columnar DBMS for OLAP. It allows analysis of data 
that is updated in real time. The project was released as open-source software 
under the Apache 2 license in June 2016.

h3. Design and implementation
1. Do only writes, reads aren't useful because ClickHouse is designed for OLAP 
queries
2. For writes, do write in batches and rely on idempotent and atomic inserts 
supported by replicated tables in ClickHouse
3. Implement ClickHouseIO.Write as PTransform<PCollection<Row>, PDone>
4. Rely on having logic for casting rows between schemas in BEAM-5918, and 
don't put it in ClickHouseIO.Write

h3. References

[1] 
http://highscalability.com/blog/2017/9/18/evolution-of-data-structures-in-yandexmetrica.html
[2] 
https://blog.cloudflare.com/how-cloudflare-analyzes-1m-dns-queries-per-second/
[3] 
https://blog.cloudflare.com/http-analytics-for-6m-requests-per-second-using-clickhouse/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to