[
https://issues.apache.org/jira/browse/CASSANDRA-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095321#comment-13095321
]
Sylvain Lebresne commented on CASSANDRA-3070:
---------------------------------------------
The last txt file you attached is just a copy of your comment.
Now, what you describe is roughly what I got from the previous log. But the
thing is, there is nothing wrong with the way the counter values are resolved.
There seems to be a value in there that shouldn't exists though. So truth is
without a way to reproduce it will be harder to find what could be wrong in
there. Do attach the newly generated log though, there could be something
slightly different that'll help. I'll continue to look the code in the eyes,
see if I find something.
A few questions though that could help narrowing it down:
* You said that 2 of the servers return a lower number. Can you be sure however
that the "right" value should be the greater one ? For instance, do you do only
increment > 1 ? Or better, do you have another source that would allow you to
tell what the right value is ?
* Does that happen with many counters ? Your initial description does suggests
it happens to more than one, but do you have an idea of how frequent it is. And
if you have multiple bad counters, are the node that are out of sync always the
same nodes ?
* You marked that 0.8.4 is affected, but have the cluster been started on
0.8.4. Or more precisely do you have an example of a counter that is
problematic and you are sure have been created *after* your upgrade to 0.8.4 ?
(I want to be sure I can definitively rule out some unfortunate consequence of
CASSANDRA-2968; even though I doubt this could be it).
* Just to be sure, you did not remove one sstable by mistake or something like
that ? Or truncated the counter column family ?
Last thing, if there is indeed more than one problematic counters, if you could
attach output logs for at least two of them would be helpful. There could be
some similarity that helps finding what's wrong.
> counter repair
> --------------
>
> Key: CASSANDRA-3070
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3070
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.8.4
> Reporter: ivan
> Assignee: Sylvain Lebresne
> Attachments: counter_local_quroum_maybeschedulerepairs.txt,
> counter_local_quroum_maybeschedulerepairs_2.txt
>
>
> Hi!
> We have some counters out of sync but repair doesn't sync values.
> We tried nodetool repair.
> We use LOCAL_QUORUM for read. A repair row mutation is sent to other nodes
> while reading a bad row but counters wasn't repaired by mutation.
> Output of two nodes were uploaded. (Some new debug messages were added.)
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira