Gleb Kanterov created BEAM-5964:
-----------------------------------
Summary: Add ClickHouseIO.Write
Key: BEAM-5964
URL: https://issues.apache.org/jira/browse/BEAM-5964
Project: Beam
Issue Type: New Feature
Components: io-ideas
Reporter: Gleb Kanterov
Assignee: Eugene Kirpichov
h3. Motivation
ClickHouse is open-source columnar DBMS for OLAP. It allows analysis of data
that is updated in real time. The project was released as open-source software
under the Apache 2 license in June 2016.
h3. Design and implementation
1. Do only writes, reads aren't useful because ClickHouse is designed for OLAP
queries
2. For writes, do write in batches and rely on idempotent and atomic inserts
supported by replicated tables in ClickHouse
3. Implement ClickHouseIO.Write as PTransform<PCollection<Row>, PDone>
4. Rely on having logic for casting rows between schemas in BEAM-5918, and
don't put it in ClickHouseIO.Write
h3. References
[1]
http://highscalability.com/blog/2017/9/18/evolution-of-data-structures-in-yandexmetrica.html
[2]
https://blog.cloudflare.com/how-cloudflare-analyzes-1m-dns-queries-per-second/
[3]
https://blog.cloudflare.com/http-analytics-for-6m-requests-per-second-using-clickhouse/
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)