[ 
https://issues.apache.org/jira/browse/BEAM-5964?focusedWorklogId=175005&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-175005
 ]

ASF GitHub Bot logged work on BEAM-5964:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 13/Dec/18 19:38
            Start Date: 13/Dec/18 19:38
    Worklog Time Spent: 10m 
      Work Description: kanterov commented on issue #7006: [BEAM-5964] Add 
ClickHouseIO.Write
URL: https://github.com/apache/beam/pull/7006#issuecomment-447093952
 
 
   @chamikaramj sorry for the delay, I'm addressing comments, and it's close to 
being finished.
   
   I went with a more straightforward approach we discussed in comments. It 
works well so far, and the code became much cleaner. I'm doing tests to see 
what is the impact on performance, and I will get back to you when it's more 
clear.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 175005)
    Time Spent: 5.5h  (was: 5h 20m)

> Add ClickHouseIO.Write
> ----------------------
>
>                 Key: BEAM-5964
>                 URL: https://issues.apache.org/jira/browse/BEAM-5964
>             Project: Beam
>          Issue Type: New Feature
>          Components: io-ideas
>            Reporter: Gleb Kanterov
>            Assignee: Gleb Kanterov
>            Priority: Major
>          Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> h3. Motivation
> ClickHouse is open-source columnar DBMS for OLAP. It allows analysis of data 
> that is updated in real time. The project was released as open-source 
> software under the Apache 2 license in June 2016.
> h3. Design and implementation
> 1. Do only writes, reads aren't useful because ClickHouse is designed for 
> OLAP queries
> 2. For writes, do write in batches and rely on idempotent and atomic inserts 
> supported by replicated tables in ClickHouse
> 3. Implement ClickHouseIO.Write as PTransform<PCollection<Row>, PDone>
> 4. Rely on having logic for casting rows between schemas in BEAM-5918, and 
> don't put it in ClickHouseIO.Write
> h3. References
> [1] 
> http://highscalability.com/blog/2017/9/18/evolution-of-data-structures-in-yandexmetrica.html
> [2] 
> https://blog.cloudflare.com/how-cloudflare-analyzes-1m-dns-queries-per-second/
> [3] 
> https://blog.cloudflare.com/http-analytics-for-6m-requests-per-second-using-clickhouse/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to