[jira] [Commented] (CASSANDRA-3070) counter repair

Sylvain Lebresne (JIRA) Thu, 01 Sep 2011 07:41:33 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095321#comment-13095321
 ]


Sylvain Lebresne commented on CASSANDRA-3070:
---------------------------------------------

The last txt file you attached is just a copy of your comment.

Now, what you describe is roughly what I got from the previous log. But the 
thing is, there is nothing wrong with the way the counter values are resolved. 
There seems to be a value in there that shouldn't exists though. So truth is 
without a way to reproduce it will be harder to find what could be wrong in 
there. Do attach the newly generated log though, there could be something 
slightly different that'll help. I'll continue to look the code in the eyes, 
see if I find something.

A few questions though that could help narrowing it down:
* You said that 2 of the servers return a lower number. Can you be sure however 
that the "right" value should be the greater one ? For instance, do you do only 
increment > 1 ? Or better, do you have another source that would allow you to 
tell what the right value is ?
* Does that happen with many counters ? Your initial description does suggests 
it happens to more than one, but do you have an idea of how frequent it is. And 
if you have multiple bad counters, are the node that are out of sync always the 
same nodes ?
* You marked that 0.8.4 is affected, but have the cluster been started on 
0.8.4. Or more precisely do you have an example of a counter that is 
problematic and you are sure have been created *after* your upgrade to 0.8.4 ? 
(I want to be sure I can definitively rule out some unfortunate consequence of 
CASSANDRA-2968; even though I doubt this could be it).
* Just to be sure, you did not remove one sstable by mistake or something like 
that ? Or truncated the counter column family ?

Last thing, if there is indeed more than one problematic counters, if you could 
attach output logs for at least two of them would be helpful. There could be 
some similarity that helps finding what's wrong.

> counter repair
> --------------
>
>                 Key: CASSANDRA-3070
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3070
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.4
>            Reporter: ivan
>            Assignee: Sylvain Lebresne
>         Attachments: counter_local_quroum_maybeschedulerepairs.txt, 
> counter_local_quroum_maybeschedulerepairs_2.txt
>
>
> Hi!
> We have some counters out of sync but repair doesn't sync values.
> We tried nodetool repair.
> We use LOCAL_QUORUM for read. A repair row mutation is sent to other nodes 
> while reading a bad row but counters wasn't repaired by mutation.
> Output of two nodes were uploaded. (Some new debug messages were added.)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3070) counter repair

Reply via email to