[
https://issues.apache.org/jira/browse/CASSANDRA-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ANSHUL SAINI updated CASSANDRA-18771:
-------------------------------------
Severity: Critical
> Cassandra 4.0.5 nodes fails to start when replacing dead node
> -------------------------------------------------------------
>
> Key: CASSANDRA-18771
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18771
> Project: Cassandra
> Issue Type: Bug
> Components: Cluster/Gossip
> Reporter: ANSHUL SAINI
> Priority: Urgent
>
> Trying to replace a down node the new nodes fail to start, using property
> {_}*replace_address*{_}.
> Below message appears continuously in system logs.
> {noformat}
> WARN [Messaging-EventLoop-3-2] 2023-08-16 14:18:58,565 NoSpamLogger.java:95
> - /xxx.xxx.xxx.xxx:7000->/yyy.yyy.yyy.yyy:7000-URGENT_MESSAGES-[no-channel]
> dropping message of type GOSSIP_DIGEST_SYN whose timeout expired before
> reaching the network
> INFO [Messaging-EventLoop-3-2] 2023-08-16 14:19:23,910 NoSpamLogger.java:92
> - /xxx.xxx.xxx.xxx->/yyy.yyy.yyy.yyy:7000-URGENT_MESSAGES-[no-channel] failed
> to connect
> io.netty.channel.ConnectTimeoutException: connection timed out:
> /xxx.xxx.xxx.xxx:7000
> at
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:576)
> at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
> at
> io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
> at
> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
> at
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
> at
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
> at
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
>
> {noformat}
> DEBUG [ScheduledTasks:1] 2023-08-16 19:09:56,919 StorageService.java:2399 -
> Ignoring application state LOAD from /yyy.yyy.yyy.yyy:7000 because it is not
> a member in token metadata
> DEBUG [ScheduledTasks:1] 2023-08-16 19:10:56,920 StorageService.java:2399 -
> Ignoring application state LOAD from /yyy.yyy.yyy.yyy:7000 because it is not
> a member in token metadata
> {noformat}
> xxx.xxx.xxx.xxx - IP of down node yyy.yyy.yyy.yyy - IP of new node
> NO other ERROR/WARNING appears in logs. The node goes into UJ state, but
> never joins the ring.
> While this doesn't happen always, but we are seeing this increased behaviour
> since upgrading from 3.11.9 to 4.0.5.
> Configuration are all fine as to mitigate this we terminate the node and
> spawn a new one with same configs.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]