[jira] [Created] (CASSANDRA-14559) Check for endpoint collision with hibernating nodes

Vincent White (JIRA) Thu, 05 Jul 2018 21:06:22 -0700

Vincent White created CASSANDRA-14559:
-----------------------------------------

Summary: Check for endpoint collision with hibernating nodes
Key: CASSANDRA-14559
URL: https://issues.apache.org/jira/browse/CASSANDRA-14559
Project: Cassandra
Issue Type: Bug
Reporter: Vincent White

I ran across an edge case when replacing a node with the same address. This
issue results in the node(and its tokens) being unsafely removed from gossip.

Steps to replicate:

1. Create 3 node cluster.
2. Stop a node
3. Replace the stopped node with a node using the same address using the
replace_address flag
4. Stop the node before it finishes bootstrapping
5. Remove the replace_address flag and restart the node to resume bootstrapping
(if the data dir is also cleared at this point the node will also generate new
tokens when it starts)
6. Stop the node before it finishes bootstrapping again
7. 30 Seconds later the node will be removed from gossip because it now matches
the check for a FatClient

I think this is only an issue when replacing a node with the same address
because other replacements now use STATUS_BOOTSTRAPPING_REPLACE and leave the
dead node unchanged.

I believe the simplest fix for this is to add a check that prevents a
non-bootstrapped node (without the replaces_address flag) starting if there is
a gossip entry for the same address in the hibernate state.

[3.11 PoC
|https://github.com/apache/cassandra/compare/trunk...vincewhite:check_for_hibernate_on_start]

--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (CASSANDRA-14559) Check for endpoint collision with hibernating nodes

Reply via email to