[
https://issues.apache.org/jira/browse/CASSANDRA-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sylvain Lebresne updated CASSANDRA-3256:
----------------------------------------
Attachment: 3256.patch
My guess is that this is probably harmless. Basically the node received a
merkle tree for a session that it doesn't know about. This means that the said
repair session has been interrupted. _A priori_, I see only two things that can
cause this:
* The repair thread on the host has been interrupted. But 1) in that case you
should have found a exception earlier on saying "Interrupted while waiting for
repair: repair will continue in the background." and 2) I don't see what could
interrupt that thread.
* The node restarted and only now receives a response for request made before
the restart. This is imho the more likely scenario. And if so, the previous
repair (the one started before the restart) won't succeed, but there is no more
consequence than that.
That being throwing an assertion error is probably such a great idea given that
it's a scenario that can happen. Attaching a patch that simply log an hopefully
more explicit message.
> AssertionError when repairing a node
> ------------------------------------
>
> Key: CASSANDRA-3256
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3256
> Project: Cassandra
> Issue Type: Bug
> Affects Versions: 0.8.5
> Reporter: Jason Harvey
> Assignee: Sylvain Lebresne
> Priority: Minor
> Labels: repair
> Fix For: 0.8.7
>
> Attachments: 3256.patch
>
>
> When repairing a node, the following exception was thrown two times:
> {code}
> ERROR [AntiEntropyStage:2] 2011-09-23 23:00:24,016
> AbstractCassandraDaemon.java (line 139) Fatal exception in thread
> Thread[AntiEntropyStage:2,5,main]
> java.lang.AssertionError
> at
> org.apache.cassandra.service.AntiEntropyService.rendezvous(AntiEntropyService.java:170)
> at
> org.apache.cassandra.service.AntiEntropyService.access$100(AntiEntropyService.java:90)
> at
> org.apache.cassandra.service.AntiEntropyService$TreeResponseVerbHandler.doVerb(AntiEntropyService.java:518)
> at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> No other errors occurred on the node. From peeking at the code, this
> assertion appears to simply check if an existing repair session could be
> found. Interestingly, the repair did continue to run after this as evidenced
> by several other AntiEntropyService entires in the log.
> 8 node ring with an RF of 3, if that matters at all. No other nodes in the
> ring threw exceptions.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira