[
https://issues.apache.org/jira/browse/CASSANDRA-20476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sam Tunnicliffe updated CASSANDRA-20476:
----------------------------------------
Test and Documentation Plan:
New unit and dtests added.
Any input on additional testing with K8s would be useful.
Status: Patch Available (was: In Progress)
This PR is slightly out of date and needs a rebase to pull in trunk commits
from the past couple of weeks.
It includes commits for CASSANDRA-20736 which is a prerequisite for this.
> Cluster is unable to recover after shutdown if IPs change
> ---------------------------------------------------------
>
> Key: CASSANDRA-20476
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20476
> Project: Apache Cassandra
> Issue Type: Bug
> Components: Transactional Cluster Metadata
> Reporter: Michael Burman
> Assignee: Sam Tunnicliffe
> Priority: Normal
> Fix For: 5.x
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> When a cluster is for any reason shutdown in a environment where the IPs can
> change, the current TCM implementation prevents the cluster from recovering.
> The previous Gossip system was able to correctly restart after this process,
> but the first node when starting with TCM will get stuck to trying to find
> nodes that do not exists anymore and it will prevent starting entirely.
> What happens is that it spams the following to the logs:
> {noformat}
> WARN [InternalResponseStage:218] 2025-03-24 12:31:53,433
> RemoteProcessor.java:227 - Got error from /10.244.3.4:7000: TIMEOUT when
> sending TCM_COMMIT_REQ, retrying on
> CandidateIterator{candidates=[/10.244.3.4:7000], checkLive=true}
> WARN [InternalResponseStage:219] 2025-03-24 12:32:03,496
> RemoteProcessor.java:227 - Got error from /10.244.3.4:7000: TIMEOUT when
> sending TCM_COMMIT_REQ, retrying on
> CandidateIterator{candidates=[/10.244.3.4:7000], checkLive=true}
> WARN [Messaging-EventLoop-3-3] 2025-03-24 12:32:13,528 NoSpamLogger.java:107
> - /10.244.4.8:7000->/10.244.3.4:7000-URGENT_MESSAGES-[no-channel] dropping
> message of type TCM_COMMIT_REQ whose timeout expired before reaching the
> network
> WARN [InternalResponseStage:220] 2025-03-24 12:32:13,529
> RemoteProcessor.java:227 - Got error from /10.244.3.4:7000: TIMEOUT when
> sending TCM_COMMIT_REQ, retrying on
> CandidateIterator{candidates=[/10.244.3.4:7000], checkLive=true}
> INFO [Messaging-EventLoop-3-6] 2025-03-24 12:32:23,373 NoSpamLogger.java:104
> - /10.244.4.8:7000->/10.244.6.7:7000-URGENT_MESSAGES-[no-channel] failed to
> connect
> io.netty.channel.ConnectTimeoutException: connection timed out after 2000 ms:
> /10.244.6.7:7000
> at
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:615)
> at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
> at
> io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:156)
> at
> io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)
> at
> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)
> at
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:408)
> at
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
> at
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:829)
> {noformat}
> And does not move forward. The node is assigned as its own seed node with the
> current IP address, which is 10.244.4.8 in this case.
> {noformat}
> INFO [main] 2025-03-24 11:55:16,938 InboundConnectionInitiator.java:165 -
> Listening on address: (/10.244.4.8:7000), nic: eth0, encryption: unencrypted
> {noformat}
> However as seen from nodetool, it has no idea of such:
> {noformat}
> [cassandra@cluster1-dc1-r1-sts-0 /]$ nodetool status
> Datacenter: dc1
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns (effective) Host ID
> Rack
> DN 10.244.4.7 ? 16 64.7%
> 6d194555-f6eb-41d0-c000-000000000001 r1
> DN 10.244.6.7 ? 16 59.3%
> 6d194555-f6eb-41d0-c000-000000000002 r2
> DN 10.244.3.4 ? 16 76.0%
> 6d194555-f6eb-41d0-c000-000000000003 r3
> [cassandra@cluster1-dc1-r1-sts-0 /]$ nodetool cms
> Cluster Metadata Service:
> Members: /10.244.3.4:7000
> Needs reconfiguration: false
> Is Member: false
> Service State: REMOTE
> Is Migrating: false
> Epoch: 24
> Local Pending Count: 0
> Commits Paused: false
> Replication factor:
> ReplicationParams{class=org.apache.cassandra.locator.MetaStrategy, dc1=1}
> [cassandra@cluster1-dc1-r1-sts-0 /]$
> {noformat}
> It will also not start listening on port 9042. It will forever wait others,
> not understanding its own IP address has changed. Since this happens to all
> nodes, the entire cluster is basically dead.
> In this configuration I used 3 racks, 3 nodes system and simply stopped the
> cluster in Kubernetes. initial_location_provider was
> RackDCFileLocationProvider and node_proximity: NetworkTopologyProximity as
> these should function like GossipingPropertyFileSnitch (according to the
> documentation). This functionality works fine in older gossiping
> implementations.
> The IPs on Kubernetes deployment change everytime the pod is deleted, so
> assuming any sort of static IPs is not going to work and would be serious
> downgrade from 5.0.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]