[
https://issues.apache.org/jira/browse/CASSANDRA-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990297#comment-12990297
]
Hudson commented on CASSANDRA-2072:
-----------------------------------
Integrated in Cassandra-0.7 #244 (See
[https://hudson.apache.org/hudson/job/Cassandra-0.7/244/])
Fix race condition during decommission by announcing for RING_DELAY and
not removing endpoint state until removing the ep from
justRemovedEndpoints.
Patch by brandonwilliams, reviewed by gdusbabek for CASSANDRA-2072
> Race condition during decommission
> ----------------------------------
>
> Key: CASSANDRA-2072
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2072
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.7.0
> Reporter: Brandon Williams
> Assignee: Brandon Williams
> Priority: Minor
> Fix For: 0.7.2
>
> Attachments:
> 0001-announce-having-left-the-ring-for-RING_DELAY-on-deco.patch,
> 0002-Improve-TRACE-logging-for-Gossiper.patch,
> 0003-Remove-endpoint-state-when-expiring-justRemovedEndpo.patch
>
>
> Occasionally when decommissioning a node, there is a race condition that
> occurs where another node will never remove the token and thus propagate it
> again with a state of down. With CASSANDRA-1900 we can solve this, but it
> shouldn't occur in the first place.
> Given nodes A, B, and C, if you decommission B it will stream to A and C.
> When complete, B will decommission and receive this stacktrace:
> ERROR 00:02:40,282 Fatal exception in thread Thread[Thread-5,5,main]
> java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut
> down
> at
> org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:62)
> at
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
> at
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
> at
> org.apache.cassandra.net.MessagingService.receive(MessagingService.java:387)
> at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:91
> At this point A will show it is removing B's token, but C will not and
> instead its failure detector will report that B is dead, and nodetool ring on
> C shows B in a leaving/down state. In another gossip round, C will propagate
> this state back to A.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira