[jira] Updated: (CASSANDRA-2072) Race condition during decommission

Brandon Williams (JIRA) Fri, 28 Jan 2011 12:31:08 -0800

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Brandon Williams updated CASSANDRA-2072:
----------------------------------------

    Attachment: 0003-Remove-endpoint-state-when-expiring-justRemovedEndpo.patch
                0002-Improve-TRACE-logging-for-Gossiper.patch
                0001-announce-having-left-the-ring-for-RING_DELAY-on-deco.patch

Here is what is happening:

B sends LEFT to C, C calls removeEndpoint and drops the endpoint state.  B 
never gets to send to A (because it only waits 2s to announce, which can be 
just one round) and A still thinks it's LEAVING.  C sees B in a gossip digest 
from A, and  not knowing anything about it, calls requestAll, but A refuses to 
tell C anything about it because A has B in justRemovedEndpoints.  Eventually, 
QUARANTINE_DELAY expires and A unhelpfully propagates the LEAVING state back to 
C.

The obvious solution here is that B should announce LEFT for RING_DELAY, simply 
because it's the right thing to do as opposed to a one-off delay of 2 seconds.

However, this exposes a more subtle problem.  When removeEndpoint is called, we 
drop the state right away and track the endpoint in justRemovedEndpoints.  
Instead, we should hold on to the state so it is still propagated in further 
gossip digests, and expire it when we expire justRemovedEndpoints.

Either of these changes is technically enough to solve this issue, but both 
together add an extra safeguard.  Changing where we expire the endpoint state 
is the more impacting of the two, however the gossip generation and version 
checks always prevent any negative consequences from doing this.

> Race condition during decommission
> ----------------------------------
>
>                 Key: CASSANDRA-2072
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2072
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Brandon Williams
>            Assignee: Brandon Williams
>            Priority: Minor
>         Attachments: 
> 0001-announce-having-left-the-ring-for-RING_DELAY-on-deco.patch, 
> 0002-Improve-TRACE-logging-for-Gossiper.patch, 
> 0003-Remove-endpoint-state-when-expiring-justRemovedEndpo.patch
>
>
> Occasionally when decommissioning a node, there is a race condition that 
> occurs where another node will never remove the token and thus propagate it 
> again with a state of down.  With CASSANDRA-1900 we can solve this, but it 
> shouldn't occur in the first place.
> Given nodes A, B, and C, if you decommission B it will stream to A and C.  
> When complete, B will decommission and receive this stacktrace:
> ERROR 00:02:40,282 Fatal exception in thread Thread[Thread-5,5,main]
> java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut 
> down
>         at 
> org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:62)
>         at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
>         at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
>         at 
> org.apache.cassandra.net.MessagingService.receive(MessagingService.java:387)
>         at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:91
> At this point A will show it is removing B's token, but C will not and 
> instead its failure detector will report that B is dead, and nodetool ring on 
> C shows B in a leaving/down state.  In another gossip round, C will propagate 
> this state back to A.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-2072) Race condition during decommission

Reply via email to