ANSHUL SAINI created CASSANDRA-18771:
----------------------------------------
Summary: Cassandra 4.0.5 nodes fails to start when replacing dead
node
Key: CASSANDRA-18771
URL: https://issues.apache.org/jira/browse/CASSANDRA-18771
Project: Cassandra
Issue Type: Bug
Components: Cluster/Gossip
Reporter: ANSHUL SAINI
Trying to replace a down node the new nodes fail to start, using property
{_}*replace_address*{_}.
Below message appears continuously in system logs.
{noformat}
WARN [Messaging-EventLoop-3-2] 2023-08-16 14:18:58,565 NoSpamLogger.java:95 -
/xxx.xxx.xxx.xxx:7000->/yyy.yyy.yyy.yyy:7000-URGENT_MESSAGES-[no-channel]
dropping message of type GOSSIP_DIGEST_SYN whose timeout expired before
reaching the network
INFO [Messaging-EventLoop-3-2] 2023-08-16 14:19:23,910 NoSpamLogger.java:92 -
/xxx.xxx.xxx.xxx->/yyy.yyy.yyy.yyy:7000-URGENT_MESSAGES-[no-channel] failed to
connect
io.netty.channel.ConnectTimeoutException: connection timed out:
/xxx.xxx.xxx.xxx:7000
at
io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:576)
at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
at
io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
at
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
at
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
at
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
{noformat}
xxx.xxx.xxx.xxx - IP of down node yyy.yyy.yyy.yyy - IP of new node
NO other ERROR/WARNING appears in logs. The node goes into UJ state, but never
joins the ring.
While this doesn't happen always, but we are seeing this increased behaviour
since upgrading from 3.11.9 to 4.0.5.
Configuration are all fine as to mitigate this we terminate the node and spawn
a new one with same configs.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]