[jira] [Commented] (CASSANDRA-11432) Counter values become under-counted when running repair.

Aleksey Yeschenko (JIRA) Mon, 04 Apr 2016 06:44:43 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-11432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15224139#comment-15224139
 ]


Aleksey Yeschenko commented on CASSANDRA-11432:
-----------------------------------------------

[~dikanggu] As a matter of fact, yes, yes you can (:

1. Is you cluster a fresh 2.2 one? More specifically, does it by any chance 
have 2.0 or older generated counters?
2. How large is larger than 1%?
3. Can you observe the same thing without repair running?
4. Have you observed any timeouts? What to you do in case of a timeout? Ignore 
or retry? Counter updates are not idempotent, so if you retry a timed out 
increment, you have a real risk of overcounting (in case the update made it, 
but client timed out). If you ignore instead, than a missed increment would 
undercount. Another case that would cause an undercount is a retried decrement, 
of course.
5. What's your commit log policy? If sync, what the sync period? Have you 
observed any node failures during the experiment that would cause any commit 
log loss?

I've had another look at the code, and nothing popped out at me, really. Gotta 
be either timeouts (maybe you time out more often during repair load?), or 
crashed nodes and subsequent commit log loss. Or, of course, I really am 
missing something esoteric.

> Counter values become under-counted when running repair.
> --------------------------------------------------------
>
>                 Key: CASSANDRA-11432
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11432
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Dikang Gu
>            Assignee: Aleksey Yeschenko
>
> We are experimenting Counters in Cassandra 2.2.5. Our setup is that we have 6 
> nodes, across three different regions, and in each region, the replication 
> factor is 2. Basically, each nodes holds a full copy of the data.
> We are writing to cluster with CL = 2, and reading with CL = 1. 
> When are doing 30k/s counter increment/decrement per node, and at the 
> meanwhile, we are double writing to our mysql tier, so that we can measure 
> the accuracy of C* counter, compared to mysql.
> The experiment result was great at the beginning, the counter value in C* and 
> mysql are very close. The difference is less than 0.1%. 
> But when we start to run the repair on one node, the counter value in C* 
> become much less than the value in mysql,  the difference becomes larger than 
> 1%.
> My question is that is it a known problem that the counter value will become 
> under-counted if repair is running? Should we avoid running repair for 
> counter tables?
> Thanks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11432) Counter values become under-counted when running repair.

Reply via email to