Josh McKenzie created CASSANDRA-17842:
-----------------------------------------
Summary: Add the ability for operators to allow intentional
loosening of definition of "empty" in Gossip for specific edge case failure
scenarios
Key: CASSANDRA-17842
URL: https://issues.apache.org/jira/browse/CASSANDRA-17842
Project: Cassandra
Issue Type: Improvement
Components: Cluster/Gossip
Reporter: Josh McKenzie
Assignee: Josh McKenzie
Right now {{empty}} is very specific to a single edge case (i.e. in
{{isEmptyWithoutStatus()}} our usage of hbState() + applicationState), but
there are other failure cases which block host replacements and require
intrusive workarounds and human intervention to recover from when you have
something in hbState() you don't expect.
If we allow opt-in to a more risky (i.e. we don’t know how we got there)
definition of empty, then host replacements can make progress even when
Gossip's gotten into a bad state. Which it does. All too often.
This parameter will obviously need some NEWS.txt and other documentation around
it to explain the context for end users.
Now that I think of it, general "how to troubleshoot Gossip problems" might be
worth writing up and including this as part of it for operators and users,
specifically on our
[Troubleshooting|https://cassandra.apache.org/doc/latest/cassandra/troubleshooting/index.html]
page. Probably create that as another ticket and defer that update to there
and rely on news.txt and the param documentation for this one just to get the
functionality into the system for operators who need it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]