[jira] [Updated] (CASSANDRA-19221) CMS: Nodes can restart with new ipaddress already defined in the cluster

Sam Tunnicliffe (Jira) Thu, 21 Dec 2023 08:51:05 -0800


     [ 
https://issues.apache.org/jira/browse/CASSANDRA-19221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sam Tunnicliffe updated CASSANDRA-19221:
----------------------------------------
     Bug Category: Parent values: Correctness(12982)Level 1 values: 
Unrecoverable Corruption / Loss(13161)
       Complexity: Normal
    Discovered By: Adhoc Test
    Fix Version/s: 5.1-alpha1
         Severity: Normal
           Status: Open  (was: Triage Needed)

> CMS: Nodes can restart with new ipaddress already defined in the cluster
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-19221
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19221
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Transactional Cluster Metadata
>            Reporter: Paul Chandler
>            Priority: Normal
>             Fix For: 5.1-alpha1
>
>
> I am simulating running a cluster in Kubernetes and testing what happens when 
> several pods go down and  ip addresses are swapped between nodes. In 4.0 this 
> is blocked and the node cannot be restarted.
> To simulate this I create a 3 node cluster on a local machine using 3 
> loopback addresses
> 127.0.0.1
> 127.0.0.2
> 127.0.0.3
> The nodes are created correctly and the first node is assigned as a CMS node 
> as shown:
> bin/nodetool -p 7199 describecms
> Cluster Metadata Service:
> Members: /127.0.0.1:7000
> Is Member: true
> Service State: LOCAL
> At this point I bring down the nodes 127.0.0.2 and 127.0.0.3 and swap the ip 
> addresses for the rpc_address and listen_address 
>  
> The nodes come back as normal, but the nodeid has now been swapped against 
> the ip address:
> Before:
> Datacenter: datacenter1
> =======================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address    Load       Tokens  Owns (effective)  Host ID                   
>             Rack
> UN  127.0.0.3  75.2 KiB   16      76.0%             
> 6d194555-f6eb-41d0-c000-000000000003  rack1
> UN  127.0.0.2  86.77 KiB  16      59.3%             
> 6d194555-f6eb-41d0-c000-000000000002  rack1
> UN  127.0.0.1  80.88 KiB  16      64.7%             
> 6d194555-f6eb-41d0-c000-000000000001  rack1
>  
> After:
> Datacenter: datacenter1
> =======================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address    Load        Tokens  Owns (effective)  Host ID                  
>              Rack
> UN  127.0.0.3  149.62 KiB  16      76.0%             
> 6d194555-f6eb-41d0-c000-000000000003  rack1
> UN  127.0.0.2  155.48 KiB  16      59.3%             
> 6d194555-f6eb-41d0-c000-000000000002  rack1
> UN  127.0.0.1  75.74 KiB   16      64.7%             
> 6d194555-f6eb-41d0-c000-000000000001  rack1
> On previous tests of this I have created a table with a replication factor of 
> 1, inserted some data before the swap.   After the swap the data on nodes 2 
> and 3 is now missing. 
> One theory I have is that I am using different port numbers for the different 
> nodes, and I am only swapping the ip addresses and not the port numbers, so 
> the ip:port still looks unique
> i.e. 127.0.0.2:9043 becomes 127.0.0.2:9044
> and 127.0.0.3:9044 becomes 127.0.0.3:9043
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (CASSANDRA-19221) CMS: Nodes can restart with new ipaddress already defined in the cluster

Reply via email to