[
https://issues.apache.org/jira/browse/CASSANDRA-19219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alex Petrov updated CASSANDRA-19219:
------------------------------------
Since Version: 5.1
Source Control Link:
https://github.com/apache/cassandra/commit/46b90364daecf1880db5eda9899d7353ad81f445
Resolution: Fixed
Status: Resolved (was: Ready to Commit)
> CMS: restarting a CMS node with different ip address
> ----------------------------------------------------
>
> Key: CASSANDRA-19219
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19219
> Project: Cassandra
> Issue Type: Bug
> Components: Transactional Cluster Metadata
> Reporter: Paul Chandler
> Assignee: Alex Petrov
> Priority: Normal
> Fix For: 5.1-alpha1
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>
> I am simulating running a cluster in Kubernetes and testing what happens when
> a pod goes down and is re created with a new ip address, the data is all
> stored on a detached volume so when the new pod is created all the old data
> for the node is reattached. In 4.0 this is handled correctly the node will
> come back up with the same hostid, tokens etc, just a new ip address and the
> cluster is healthy throughout.
>
> To simulate this I create a 3 node cluster on a local machine using 3
> loopback addresses
> 127.0.0.1
> 127.0.0.2
> 127.0.0.3
> I then run nodetool -p 7199 reconfigurecms datacenter1:3 --sync to create 3
> CMS nodes
> I then bring down 127.0.0.1 and replace the rpc_address and listen_address
> with 127.0.0.4 and re start the node. The node then hangs with this as the
> last error message:
> (8821185654333640868,9200867415893016118]=ForRange\{lastModified=Epoch{epoch=12},
>
> endpointsForRange=[Full(/127.0.0.1:7000,(8821185654333640868,9200867415893016118]),
> Full(/127.0.0.2:7000,(8821185654333640868,9200867415893016118]),
> Full(/127.0.0.3:7000,(8821185654333640868,9200867415893016118])]},
> }}}, lockedRanges=LockedRanges\{lastModified=Epoch{epoch=14}, locked={}}}.
> This can mean that this node is configured differently from CMS.
> java.lang.AssertionError: not aware of any cluster members
> at
> org.apache.cassandra.locator.NetworkTopologyStrategy.calculateNaturalReplicas(NetworkTopologyStrategy.java:233)
> at
> org.apache.cassandra.locator.CMSPlacementStrategy$DatacenterAware.reconfigure(CMSPlacementStrategy.java:119)
> at
> org.apache.cassandra.tcm.transformations.cms.PrepareCMSReconfiguration$Complex.execute(PrepareCMSReconfiguration.java:164)
> at
> org.apache.cassandra.tcm.log.LocalLog.processPendingInternal(LocalLog.java:429)
> at
> org.apache.cassandra.tcm.log.LocalLog$Async$AsyncRunnable.run(LocalLog.java:682)
> at
> org.apache.cassandra.concurrent.InfiniteLoopExecutor.loop(InfiniteLoopExecutor.java:121)
> at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:829)
> WARN [GlobalLogFollower] 2023-12-21 11:11:34,408 LocalLog.java:693 -
> Stopping log processing on the node... All subsequent epochs will be ignored.
> org.apache.cassandra.tcm.log.LocalLog$StopProcessingException:
> java.lang.AssertionError: not aware of any cluster members
> at
> org.apache.cassandra.tcm.log.LocalLog.processPendingInternal(LocalLog.java:434)
> at
> org.apache.cassandra.tcm.log.LocalLog$Async$AsyncRunnable.run(LocalLog.java:682)
> at
> org.apache.cassandra.concurrent.InfiniteLoopExecutor.loop(InfiniteLoopExecutor.java:121)
> at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.lang.AssertionError: not aware of any cluster members
> at
> org.apache.cassandra.locator.NetworkTopologyStrategy.calculateNaturalReplicas(NetworkTopologyStrategy.java:233)
> at
> org.apache.cassandra.locator.CMSPlacementStrategy$DatacenterAware.reconfigure(CMSPlacementStrategy.java:119)
> at
> org.apache.cassandra.tcm.transformations.cms.PrepareCMSReconfiguration$Complex.execute(PrepareCMSReconfiguration.java:164)
> at
> org.apache.cassandra.tcm.log.LocalLog.processPendingInternal(LocalLog.java:429)
> ... 4 common frames omitted
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]