[
https://issues.apache.org/jira/browse/CASSANDRA-19221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sam Tunnicliffe updated CASSANDRA-19221:
----------------------------------------
Bug Category: Parent values: Correctness(12982)Level 1 values:
Unrecoverable Corruption / Loss(13161)
Complexity: Normal
Discovered By: Adhoc Test
Fix Version/s: 5.1-alpha1
Severity: Normal
Status: Open (was: Triage Needed)
> CMS: Nodes can restart with new ipaddress already defined in the cluster
> ------------------------------------------------------------------------
>
> Key: CASSANDRA-19221
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19221
> Project: Cassandra
> Issue Type: Bug
> Components: Transactional Cluster Metadata
> Reporter: Paul Chandler
> Priority: Normal
> Fix For: 5.1-alpha1
>
>
> I am simulating running a cluster in Kubernetes and testing what happens when
> several pods go down and ip addresses are swapped between nodes. In 4.0 this
> is blocked and the node cannot be restarted.
> To simulate this I create a 3 node cluster on a local machine using 3
> loopback addresses
> 127.0.0.1
> 127.0.0.2
> 127.0.0.3
> The nodes are created correctly and the first node is assigned as a CMS node
> as shown:
> bin/nodetool -p 7199 describecms
> Cluster Metadata Service:
> Members: /127.0.0.1:7000
> Is Member: true
> Service State: LOCAL
> At this point I bring down the nodes 127.0.0.2 and 127.0.0.3 and swap the ip
> addresses for the rpc_address and listen_address
>
> The nodes come back as normal, but the nodeid has now been swapped against
> the ip address:
> Before:
> Datacenter: datacenter1
> =======================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns (effective) Host ID
> Rack
> UN 127.0.0.3 75.2 KiB 16 76.0%
> 6d194555-f6eb-41d0-c000-000000000003 rack1
> UN 127.0.0.2 86.77 KiB 16 59.3%
> 6d194555-f6eb-41d0-c000-000000000002 rack1
> UN 127.0.0.1 80.88 KiB 16 64.7%
> 6d194555-f6eb-41d0-c000-000000000001 rack1
>
> After:
> Datacenter: datacenter1
> =======================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns (effective) Host ID
> Rack
> UN 127.0.0.3 149.62 KiB 16 76.0%
> 6d194555-f6eb-41d0-c000-000000000003 rack1
> UN 127.0.0.2 155.48 KiB 16 59.3%
> 6d194555-f6eb-41d0-c000-000000000002 rack1
> UN 127.0.0.1 75.74 KiB 16 64.7%
> 6d194555-f6eb-41d0-c000-000000000001 rack1
> On previous tests of this I have created a table with a replication factor of
> 1, inserted some data before the swap. After the swap the data on nodes 2
> and 3 is now missing.
> One theory I have is that I am using different port numbers for the different
> nodes, and I am only swapping the ip addresses and not the port numbers, so
> the ip:port still looks unique
> i.e. 127.0.0.2:9043 becomes 127.0.0.2:9044
> and 127.0.0.3:9044 becomes 127.0.0.3:9043
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]