[jira] [Commented] (CASSANDRA-8336) Add shutdown gossip state to prevent timeouts during rolling restarts

Ariel Weisberg (JIRA) Mon, 11 May 2015 09:47:35 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538155#comment-14538155
 ]


Ariel Weisberg commented on CASSANDRA-8336:
-------------------------------------------

[~brandon.williams] don't shoot the messenger, but you had a behavior of Gossip 
you didn't like. You made changes to try and get a different behavior. But I 
don't see a test added checking that?

I know Gossip testing is problematic as you commented on CASSANDRA-9100. If you 
as a developer can't test the changes you are making in a reasonable way then 
that is definitely retrospective fodder for went poorly. It also maybe shows us 
what it is costing to not tackle the singleton issue.

> Add shutdown gossip state to prevent timeouts during rolling restarts
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-8336
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8336
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Brandon Williams
>             Fix For: 2.0.15, 2.1.5
>
>         Attachments: 8336-v2.txt, 8336-v3.txt, 8336-v4.txt, 8336.txt, 
> 8366-v5.txt
>
>
> In CASSANDRA-3936 we added a gossip shutdown announcement.  The problem here 
> is that this isn't sufficient; you can still get TOEs and have to wait on the 
> FD to figure things out.  This happens due to gossip propagation time and 
> variance; if node X shuts down and sends the message to Y, but Z has a 
> greater gossip version than Y for X and has not yet received the message, it 
> can initiate gossip with Y and thus mark X alive again.  I propose 
> quarantining to solve this, however I feel it should be a -D parameter you 
> have to specify, so as not to destroy current dev and test practices, since 
> this will mean a node that shuts down will not be able to restart until the 
> quarantine expires.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8336) Add shutdown gossip state to prevent timeouts during rolling restarts

Reply via email to