Michael Burman created CASSANDRA-20476:
------------------------------------------
Summary: Cluster is unable to recover after shutdown if IPs change
Key: CASSANDRA-20476
URL: https://issues.apache.org/jira/browse/CASSANDRA-20476
Project: Apache Cassandra
Issue Type: Bug
Components: Transactional Cluster Metadata
Reporter: Michael Burman
When a cluster is for any reason shutdown in a environment where the IPs can
change, the current TCM implementation prevents the cluster from recovering.
The previous Gossip system was able to correctly restart after this process,
but the first node when starting with TCM will get stuck to trying to find
nodes that do not exists anymore and it will prevent starting entirely.
What happens is that it spams the following to the logs:
{noformat}
WARN [InternalResponseStage:218] 2025-03-24 12:31:53,433
RemoteProcessor.java:227 - Got error from /10.244.3.4:7000: TIMEOUT when
sending TCM_COMMIT_REQ, retrying on
CandidateIterator{candidates=[/10.244.3.4:7000], checkLive=true}
WARN [InternalResponseStage:219] 2025-03-24 12:32:03,496
RemoteProcessor.java:227 - Got error from /10.244.3.4:7000: TIMEOUT when
sending TCM_COMMIT_REQ, retrying on
CandidateIterator{candidates=[/10.244.3.4:7000], checkLive=true}
WARN [Messaging-EventLoop-3-3] 2025-03-24 12:32:13,528 NoSpamLogger.java:107 -
/10.244.4.8:7000->/10.244.3.4:7000-URGENT_MESSAGES-[no-channel] dropping
message of type TCM_COMMIT_REQ whose timeout expired before reaching the network
WARN [InternalResponseStage:220] 2025-03-24 12:32:13,529
RemoteProcessor.java:227 - Got error from /10.244.3.4:7000: TIMEOUT when
sending TCM_COMMIT_REQ, retrying on
CandidateIterator{candidates=[/10.244.3.4:7000], checkLive=true}
INFO [Messaging-EventLoop-3-6] 2025-03-24 12:32:23,373 NoSpamLogger.java:104 -
/10.244.4.8:7000->/10.244.6.7:7000-URGENT_MESSAGES-[no-channel] failed to
connect
io.netty.channel.ConnectTimeoutException: connection timed out after 2000 ms:
/10.244.6.7:7000
at
io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:615)
at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
at
io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:156)
at
io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)
at
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)
at
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:408)
at
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
{noformat}
And does not move forward. The node is assigned as its own seed node with the
current IP address, which is 10.244.4.8 in this case.
{noformat}
INFO [main] 2025-03-24 11:55:16,938 InboundConnectionInitiator.java:165 -
Listening on address: (/10.244.4.8:7000), nic: eth0, encryption: unencrypted
{noformat}
However as seen from nodetool, it has no idea of such:
{noformat}
[cassandra@cluster1-dc1-r1-sts-0 /]$ nodetool status
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID
Rack
DN 10.244.4.7 ? 16 64.7%
6d194555-f6eb-41d0-c000-000000000001 r1
DN 10.244.6.7 ? 16 59.3%
6d194555-f6eb-41d0-c000-000000000002 r2
DN 10.244.3.4 ? 16 76.0%
6d194555-f6eb-41d0-c000-000000000003 r3
[cassandra@cluster1-dc1-r1-sts-0 /]$ nodetool cms
Cluster Metadata Service:
Members: /10.244.3.4:7000
Needs reconfiguration: false
Is Member: false
Service State: REMOTE
Is Migrating: false
Epoch: 24
Local Pending Count: 0
Commits Paused: false
Replication factor:
ReplicationParams{class=org.apache.cassandra.locator.MetaStrategy, dc1=1}
[cassandra@cluster1-dc1-r1-sts-0 /]$
{noformat}
It will also not start listening on port 9042. It will forever wait others, not
understanding its own IP address has changed. Since this happens to all nodes,
the entire cluster is basically dead.
In this configuration I used 3 racks, 3 nodes system and simply stopped the
cluster in Kubernetes. initial_location_provider was RackDCFileLocationProvider
and node_proximity: NetworkTopologyProximity as these should function like
GossipingPropertyFileSnitch (according to the documentation). This
functionality works fine in older gossiping implementations.
The IPs on Kubernetes deployment change everytime the pod is deleted, so
assuming any sort of static IPs is not going to work and would be serious
downgrade from 5.0.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]