[jira] [Issue Comment Edited] (CASSANDRA-3070) counter repair

Peter Schuller (Issue Comment Edited) (JIRA) Fri, 09 Dec 2011 14:21:06 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13166617#comment-13166617
 ]


Peter Schuller edited comment on CASSANDRA-3070 at 12/9/11 10:19 PM:
---------------------------------------------------------------------

This may be relevant, quoting myself from IRC:

{code}
21:20:01 < scode> pcmanus: Hey, are you there?                                  
                                                                                
                                        21:20:21 < scode> pcmanus: I am 
investigating something which might be 
https://issues.apache.org/jira/browse/CASSANDRA-3070                            
                                                 21:20:37 < scode> pcmanus: And 
I could use the help of someone with his brain all over counters, and Stu isn't 
here atm. :)                                                                    
         21:21:16 < scode> pcmanus: 
https://gist.github.com/8202cb46c8bd00c8391b                                    
                                                                                
             21:21:37 < scode> pcmanus: I am investigating why with CL.ALL and 
CL.QUORUM, I get seemingly random/varying results when I read a counter.        
                                                      21:21:53 < scode> 
pcmanus: I have the offending sstables on a three-node test setup and am 
inserting debug printouts in the code to trace the reconiliation.               
                             21:21:57 < scode> pcmanus: The gist above shows 
what's happening.                                                               
                                                                        
21:22:11 < scode> pcmanus: The latter is the wrong one, and the former is the 
correct one.                                                                    
                                          21:22:28 < scode> pcmanus: The 
interesting bit is that I see shards with the same node_id *AND* clock, but 
*DIFFERENT* counts.                                                             
             21:22:53 < scode> pcmanus: My understanding of counters is that 
there should never (globally across an entire cluster in all sstables) exist 
two shards for the same node_id+clock but with different                  
counts.                                                                         
                                                                                
                      21:22:57 < scode> pcmanus: Is my understanding correct 
there?                                                                          
                                                                 21:25:10 < 
scode> pcmanus: There is one node out of the three that has the "offending" 
card (with a count of 2 instead of 1). Like with 3070, we observed this after 
having expanded a cluster (though I'm not sure how that would cause it, and we 
don't know if there existed a problem before the expansion).                    
                                                         {code}

                
      was (Author: scode):
    This may be relevant, quoting myself from IRC:

{quote}
21:20:01 < scode> pcmanus: Hey, are you there?                                  
                                                                                
                                        21:20:21 < scode> pcmanus: I am 
investigating something which might be 
https://issues.apache.org/jira/browse/CASSANDRA-3070                            
                                                 21:20:37 < scode> pcmanus: And 
I could use the help of someone with his brain all over counters, and Stu isn't 
here atm. :)                                                                    
         21:21:16 < scode> pcmanus: 
https://gist.github.com/8202cb46c8bd00c8391b                                    
                                                                                
             21:21:37 < scode> pcmanus: I am investigating why with CL.ALL and 
CL.QUORUM, I get seemingly random/varying results when I read a counter.        
                                                      21:21:53 < scode> 
pcmanus: I have the offending sstables on a three-node test setup and am 
inserting debug printouts in the code to trace the reconiliation.               
                             21:21:57 < scode> pcmanus: The gist above shows 
what's happening.                                                               
                                                                        
21:22:11 < scode> pcmanus: The latter is the wrong one, and the former is the 
correct one.                                                                    
                                          21:22:28 < scode> pcmanus: The 
interesting bit is that I see shards with the same node_id *AND* clock, but 
*DIFFERENT* counts.                                                             
             21:22:53 < scode> pcmanus: My understanding of counters is that 
there should never (globally across an entire cluster in all sstables) exist 
two shards for the same node_id+clock but with different                  
counts.                                                                         
                                                                                
                      21:22:57 < scode> pcmanus: Is my understanding correct 
there?                                                                          
                                                                 21:25:10 < 
scode> pcmanus: There is one node out of the three that has the "offending" 
card (with a count of 2 instead of 1). Like with 3070, we observed this after 
having expanded a cluster (though I'm not sure how that would cause it, and we 
don't know if there existed a problem before the expansion).                    
                                                         {quote}

                  
> counter repair
> --------------
>
>                 Key: CASSANDRA-3070
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3070
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.4
>            Reporter: ivan
>            Assignee: Sylvain Lebresne
>         Attachments: counter_local_quroum_maybeschedulerepairs.txt, 
> counter_local_quroum_maybeschedulerepairs_2.txt, 
> counter_local_quroum_maybeschedulerepairs_3.txt
>
>
> Hi!
> We have some counters out of sync but repair doesn't sync values.
> We tried nodetool repair.
> We use LOCAL_QUORUM for read. A repair row mutation is sent to other nodes 
> while reading a bad row but counters wasn't repaired by mutation.
> Output of two nodes were uploaded. (Some new debug messages were added.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-3070) counter repair

Reply via email to