Josh McKenzie created CASSANDRA-17842:
-----------------------------------------

             Summary: Add the ability for operators to allow intentional 
loosening of definition of "empty" in Gossip for specific edge case failure 
scenarios
                 Key: CASSANDRA-17842
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17842
             Project: Cassandra
          Issue Type: Improvement
          Components: Cluster/Gossip
            Reporter: Josh McKenzie
            Assignee: Josh McKenzie


Right now {{empty}} is very specific to a single edge case (i.e. in 
{{isEmptyWithoutStatus()}} our usage of hbState() + applicationState), but 
there are other failure cases which block host replacements and require 
intrusive workarounds and human intervention to recover from when you have 
something in hbState() you don't expect.

If we allow opt-in to a more risky (i.e. we don’t know how we got there) 
definition of empty, then host replacements can make progress even when 
Gossip's gotten into a bad state. Which it does. All too often.

This parameter will obviously need some NEWS.txt and other documentation around 
it to explain the context for end users.

Now that I think of it, general "how to troubleshoot Gossip problems" might be 
worth writing up and including this as part of it for operators and users, 
specifically on our 
[Troubleshooting|https://cassandra.apache.org/doc/latest/cassandra/troubleshooting/index.html]
 page. Probably create that as another ticket and defer that update to there 
and rely on news.txt and the param documentation for this one just to get the 
functionality into the system for operators who need it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to