[ 
https://issues.apache.org/jira/browse/CASSANDRA-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-3256:
----------------------------------------

    Attachment: 3256.patch

My guess is that this is probably harmless. Basically the node received a 
merkle tree for a session that it doesn't know about. This means that the said 
repair session has been interrupted. _A priori_, I see only two things that can 
cause this:
  * The repair thread on the host has been interrupted. But 1) in that case you 
should have found a exception earlier on saying "Interrupted while waiting for 
repair: repair will continue in the background." and 2) I don't see what could 
interrupt that thread.
  * The node restarted and only now receives a response for request made before 
the restart. This is imho the more likely scenario. And if so, the previous 
repair (the one started before the restart) won't succeed, but there is no more 
consequence than that.

That being throwing an assertion error is probably such a great idea given that 
it's a scenario that can happen. Attaching a patch that simply log an hopefully 
more explicit message.

> AssertionError when repairing a node
> ------------------------------------
>
>                 Key: CASSANDRA-3256
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3256
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.5
>            Reporter: Jason Harvey
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>              Labels: repair
>             Fix For: 0.8.7
>
>         Attachments: 3256.patch
>
>
> When repairing a node, the following exception was thrown two times:
> {code}
> ERROR [AntiEntropyStage:2] 2011-09-23 23:00:24,016 
> AbstractCassandraDaemon.java (line 139) Fatal exception in thread 
> Thread[AntiEntropyStage:2,5,main]
> java.lang.AssertionError
>         at 
> org.apache.cassandra.service.AntiEntropyService.rendezvous(AntiEntropyService.java:170)
>         at 
> org.apache.cassandra.service.AntiEntropyService.access$100(AntiEntropyService.java:90)
>         at 
> org.apache.cassandra.service.AntiEntropyService$TreeResponseVerbHandler.doVerb(AntiEntropyService.java:518)
>         at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> {code}
> No other errors occurred on the node. From peeking at the code, this 
> assertion appears to simply check if an existing repair session could be 
> found. Interestingly, the repair did continue to run after this as evidenced 
> by several other AntiEntropyService entires in the log.
> 8 node ring with an RF of 3, if that matters at all. No other nodes in the 
> ring threw exceptions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to