[
https://issues.apache.org/jira/browse/CASSANDRA-8336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538155#comment-14538155
]
Ariel Weisberg commented on CASSANDRA-8336:
-------------------------------------------
[~brandon.williams] don't shoot the messenger, but you had a behavior of Gossip
you didn't like. You made changes to try and get a different behavior. But I
don't see a test added checking that?
I know Gossip testing is problematic as you commented on CASSANDRA-9100. If you
as a developer can't test the changes you are making in a reasonable way then
that is definitely retrospective fodder for went poorly. It also maybe shows us
what it is costing to not tackle the singleton issue.
> Add shutdown gossip state to prevent timeouts during rolling restarts
> ---------------------------------------------------------------------
>
> Key: CASSANDRA-8336
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8336
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Brandon Williams
> Assignee: Brandon Williams
> Fix For: 2.0.15, 2.1.5
>
> Attachments: 8336-v2.txt, 8336-v3.txt, 8336-v4.txt, 8336.txt,
> 8366-v5.txt
>
>
> In CASSANDRA-3936 we added a gossip shutdown announcement. The problem here
> is that this isn't sufficient; you can still get TOEs and have to wait on the
> FD to figure things out. This happens due to gossip propagation time and
> variance; if node X shuts down and sends the message to Y, but Z has a
> greater gossip version than Y for X and has not yet received the message, it
> can initiate gossip with Y and thus mark X alive again. I propose
> quarantining to solve this, however I feel it should be a -D parameter you
> have to specify, so as not to destroy current dev and test practices, since
> this will mean a node that shuts down will not be able to restart until the
> quarantine expires.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)