[jira] [Commented] (CASSANDRA-19221) CMS: Nodes can restart with new ipaddress already defined in the cluster

Paul Chandler (Jira) Thu, 21 Dec 2023 08:08:05 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-19221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17799481#comment-17799481
 ]


Paul Chandler commented on CASSANDRA-19221:
-------------------------------------------

I tried changing all the nodes to port 9042, and that did not make a difference.

Here is an example of the data going missing 



cqlsh> desc paul;

CREATE KEYSPACE paul WITH replication = \{'class': 'NetworkTopologyStrategy', 
'datacenter1': '1'}  AND durable_writes = true;

CREATE TABLE paul.test (
    id int PRIMARY KEY
) WITH additional_write_policy = '99p'
    AND allow_auto_snapshot = true
    AND bloom_filter_fp_chance = 0.01
    AND caching = \{'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND cdc = false
    AND comment = ''
    AND compaction = \{'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32', 'min_threshold': '4'}
    AND compression = \{'chunk_length_in_kb': '16', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND memtable = 'default'
    AND crc_check_chance = 1.0
    AND default_time_to_live = 0
    AND extensions = {}
    AND gc_grace_seconds = 864000
    AND incremental_backups = true
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair = 'BLOCKING'
    AND speculative_retry = '99p';
cqlsh>

 

Before the node switch:



select * from paul.test ;

 id
----
  5
 10
  1
  8
  2
  4
  7
  6
  9
  3

(10 rows)

After the switch:



select * from paul.test ;

 id
----
  4

id 4 was the only data item stored on 127.0.0.1

> CMS: Nodes can restart with new ipaddress already defined in the cluster
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-19221
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19221
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Transactional Cluster Metadata
>            Reporter: Paul Chandler
>            Priority: Normal
>
> I am simulating running a cluster in Kubernetes and testing what happens when 
> several pods go down and  ip addresses are swapped between nodes. In 4.0 this 
> is blocked and the node cannot be restarted.
> To simulate this I create a 3 node cluster on a local machine using 3 
> loopback addresses
> 127.0.0.1
> 127.0.0.2
> 127.0.0.3
> The nodes are created correctly and the first node is assigned as a CMS node 
> as shown:
> bin/nodetool -p 7199 describecms
> Cluster Metadata Service:
> Members: /127.0.0.1:7000
> Is Member: true
> Service State: LOCAL
> At this point I bring down the nodes 127.0.0.2 and 127.0.0.3 and swap the ip 
> addresses for the rpc_address and listen_address 
>  
> The nodes come back as normal, but the nodeid has now been swapped against 
> the ip address:
> Before:
> Datacenter: datacenter1
> =======================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address    Load       Tokens  Owns (effective)  Host ID                   
>             Rack
> UN  127.0.0.3  75.2 KiB   16      76.0%             
> 6d194555-f6eb-41d0-c000-000000000003  rack1
> UN  127.0.0.2  86.77 KiB  16      59.3%             
> 6d194555-f6eb-41d0-c000-000000000002  rack1
> UN  127.0.0.1  80.88 KiB  16      64.7%             
> 6d194555-f6eb-41d0-c000-000000000001  rack1
>  
> After:
> Datacenter: datacenter1
> =======================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address    Load        Tokens  Owns (effective)  Host ID                  
>              Rack
> UN  127.0.0.3  149.62 KiB  16      76.0%             
> 6d194555-f6eb-41d0-c000-000000000003  rack1
> UN  127.0.0.2  155.48 KiB  16      59.3%             
> 6d194555-f6eb-41d0-c000-000000000002  rack1
> UN  127.0.0.1  75.74 KiB   16      64.7%             
> 6d194555-f6eb-41d0-c000-000000000001  rack1
> On previous tests of this I have created a table with a replication factor of 
> 1, inserted some data before the swap.   After the swap the data on nodes 2 
> and 3 is now missing. 
> One theory I have is that I am using different port numbers for the different 
> nodes, and I am only swapping the ip addresses and not the port numbers, so 
> the ip:port still looks unique
> i.e. 127.0.0.2:9043 becomes 127.0.0.2:9044
> and 127.0.0.3:9044 becomes 127.0.0.3:9043
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-19221) CMS: Nodes can restart with new ipaddress already defined in the cluster

Reply via email to