[jira] Commented: (CASSANDRA-1451) Shutting down a node "cleanly" still kills client requests when the node goes down

Erik Onnen (JIRA) Mon, 22 Nov 2010 14:52:41 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934646#action_12934646
 ]


Erik Onnen commented on CASSANDRA-1451:
---------------------------------------

Here's what we observed that lead to this being discussed in IRC.

When executing nodetool drain, a node is no-longer able to accept new write 
operations. This is problematic for several reasons in the current 
implementation:

1) The drain node actually accepts writes, just won't process them locally but 
it will ship writes to remote endpoints. In 0.6.8, the write can actually be 
successful, even though a timeout error is reported back to the client when the 
local write fails causing the client to think the write fails when it in fact 
succeeded.
2) The drain node can still process some writes, just not writes for which it 
is a natural endpoint. This leads to non-deterministic behavior for clients 
where some writes succeed, but others fail.
3) The drain node can still process reads. This causes some upstream client 
libraries to think the node is healthy when in reality it should be shunned (at 
least for writes).
4) When a local write is rejected, it surfaces as a timeout exception. This is 
the same behavior that happens when pending read/write stage operations are 
full. In many cases, it's proper for a client to retry when read/write are full 
but due to how this appears to the client, the client cannot distinguish 
whether read/writes are backed up, or if the local node is simply rejecting the 
write as a result of being in a drain. The clients can't self-help in this 
case, they're left to guess which is bad.

> Shutting down a node "cleanly" still kills client requests when the node goes 
> down
> ----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1451
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1451
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.6.5
>            Reporter: David King
>
> Shutting down a node, even more cleanly through drain, still kills some 
> requests with timeoutexceptions. Ideally, operations would not be sent at all 
> to nodes that are known to be shutting down, perhaps by shutting down gossip 
> before starting the draining process. 
> Other nodes will still need to have the phi convict threshold exceeded, but 
> presumably that's usually shorter than drain

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1451) Shutting down a node "cleanly" still kills client requests when the node goes down

Reply via email to