[ 
https://issues.apache.org/jira/browse/CASSANDRA-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755233#comment-17755233
 ] 

ANSHUL SAINI commented on CASSANDRA-18771:
------------------------------------------

Yes NO ERROR shows up in logs, but node just remains in a hung state (UJ) for 
days without actually joining the cluster.
Could be related/similar to CASSANDRA-16877 (but this was fixed in 4.0.1).

> Cassandra 4.0.5 nodes fails to start when replacing dead node
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-18771
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18771
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Cluster/Gossip
>            Reporter: ANSHUL SAINI
>            Priority: Urgent
>
> Trying to replace a down node the new nodes fail to start, using property 
> {_}*replace_address*{_}.
> Below message appears continuously in system logs.
> {noformat}
> WARN  [Messaging-EventLoop-3-2] 2023-08-16 14:18:58,565 NoSpamLogger.java:95 
> - /xxx.xxx.xxx.xxx:7000->/yyy.yyy.yyy.yyy:7000-URGENT_MESSAGES-[no-channel] 
> dropping message of type GOSSIP_DIGEST_SYN whose timeout expired before 
> reaching the network
> INFO  [Messaging-EventLoop-3-2] 2023-08-16 14:19:23,910 NoSpamLogger.java:92 
> - /xxx.xxx.xxx.xxx->/yyy.yyy.yyy.yyy:7000-URGENT_MESSAGES-[no-channel] failed 
> to connect
> io.netty.channel.ConnectTimeoutException: connection timed out: 
> /xxx.xxx.xxx.xxx:7000
>     at 
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:576)
>     at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
>     at 
> io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
>     at 
> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
>     at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
>     at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
>     at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>     at 
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>     at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>     at java.lang.Thread.run(Thread.java:748)
> {noformat}
>  
> {noformat}
> DEBUG [ScheduledTasks:1] 2023-08-16 19:09:56,919 StorageService.java:2399 - 
> Ignoring application state LOAD from /yyy.yyy.yyy.yyy:7000 because it is not 
> a member in token metadata 
> DEBUG [ScheduledTasks:1] 2023-08-16 19:10:56,920 StorageService.java:2399 - 
> Ignoring application state LOAD from /yyy.yyy.yyy.yyy:7000 because it is not 
> a member in token metadata 
>  {noformat}
> xxx.xxx.xxx.xxx - IP of down node yyy.yyy.yyy.yyy - IP of new node
> NO other ERROR/WARNING appears in logs. The node goes into UJ state, but 
> never joins the ring.
> While this doesn't happen always, but we are seeing this increased behaviour 
> since upgrading from 3.11.9 to 4.0.5.
> Configuration are all fine as to mitigate this we terminate the node and 
> spawn a new one with same configs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to