[jira] [Updated] (CASSANDRA-18771) Cassandra 4.0.5 nodes fails to start when replacing dead node

ANSHUL SAINI (Jira) Wed, 16 Aug 2023 12:16:04 -0700


     [ 
https://issues.apache.org/jira/browse/CASSANDRA-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ANSHUL SAINI updated CASSANDRA-18771:
-------------------------------------
    Description: 
Trying to replace a down node the new nodes fail to start, using property 
{_}*replace_address*{_}.

Below message appears continuously in system logs.
{noformat}
WARN  [Messaging-EventLoop-3-2] 2023-08-16 14:18:58,565 NoSpamLogger.java:95 - 
/xxx.xxx.xxx.xxx:7000->/yyy.yyy.yyy.yyy:7000-URGENT_MESSAGES-[no-channel] 
dropping message of type GOSSIP_DIGEST_SYN whose timeout expired before 
reaching the network

INFO  [Messaging-EventLoop-3-2] 2023-08-16 14:19:23,910 NoSpamLogger.java:92 - 
/xxx.xxx.xxx.xxx->/yyy.yyy.yyy.yyy:7000-URGENT_MESSAGES-[no-channel] failed to 
connect

io.netty.channel.ConnectTimeoutException: connection timed out: 
/xxx.xxx.xxx.xxx:7000

    at 
io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:576)

    at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)

    at 
io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)

    at 
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)

    at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)

    at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)

    at 
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)

    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)

    at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)

    at java.lang.Thread.run(Thread.java:748)

{noformat}
 
{noformat}
DEBUG [ScheduledTasks:1] 2023-08-16 19:09:56,919 StorageService.java:2399 - 
Ignoring application state LOAD from /yyy.yyy.yyy.yyy:7000 because it is not a 
member in token metadata 

DEBUG [ScheduledTasks:1] 2023-08-16 19:10:56,920 StorageService.java:2399 - 
Ignoring application state LOAD from /yyy.yyy.yyy.yyy:7000 because it is not a 
member in token metadata 

 {noformat}

xxx.xxx.xxx.xxx - IP of down node yyy.yyy.yyy.yyy - IP of new node

NO other ERROR/WARNING appears in logs. The node goes into UJ state, but never 
joins the ring.

While this doesn't happen always, but we are seeing this increased behaviour 
since upgrading from 3.11.9 to 4.0.5.

Configuration are all fine as to mitigate this we terminate the node and spawn 
a new one with same configs.

  was:
Trying to replace a down node the new nodes fail to start, using property 
{_}*replace_address*{_}.

Below message appears continuously in system logs.
{noformat}
WARN  [Messaging-EventLoop-3-2] 2023-08-16 14:18:58,565 NoSpamLogger.java:95 - 
/xxx.xxx.xxx.xxx:7000->/yyy.yyy.yyy.yyy:7000-URGENT_MESSAGES-[no-channel] 
dropping message of type GOSSIP_DIGEST_SYN whose timeout expired before 
reaching the network

INFO  [Messaging-EventLoop-3-2] 2023-08-16 14:19:23,910 NoSpamLogger.java:92 - 
/xxx.xxx.xxx.xxx->/yyy.yyy.yyy.yyy:7000-URGENT_MESSAGES-[no-channel] failed to 
connect

io.netty.channel.ConnectTimeoutException: connection timed out: 
/xxx.xxx.xxx.xxx:7000

    at 
io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:576)

    at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)

    at 
io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)

    at 
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)

    at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)

    at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)

    at 
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)

    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)

    at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)

    at java.lang.Thread.run(Thread.java:748)

{noformat}
 
{noformat}
DEBUG [ScheduledTasks:1] 2023-08-16 19:08:56,919 StorageService.java:2399 - 
Ignoring application state LOAD from /10.133.66.188:7000 because it is not a 
member in token metadata DEBUG [ScheduledTasks:1] 2023-08-16 19:09:56,919 
StorageService.java:2399 - Ignoring application state LOAD from 
/10.133.66.188:7000 because it is not a member in token metadata DEBUG 
[ScheduledTasks:1] 2023-08-16 19:10:56,920 StorageService.java:2399 - Ignoring 
application state LOAD from /10.133.66.188:7000 because it is not a member in 
token metadata DEBUG [ScheduledTasks:1] 2023-08-16 19:11:56,920 
StorageService.java:2399 - Ignoring application state LOAD from 
/10.133.66.188:7000 because it is not a member in token metadata
 {noformat}


xxx.xxx.xxx.xxx - IP of down node yyy.yyy.yyy.yyy - IP of new node

NO other ERROR/WARNING appears in logs. The node goes into UJ state, but never 
joins the ring.

While this doesn't happen always, but we are seeing this increased behaviour 
since upgrading from 3.11.9 to 4.0.5.

Configuration are all fine as to mitigate this we terminate the node and spawn 
a new one with same configs.


> Cassandra 4.0.5 nodes fails to start when replacing dead node
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-18771
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18771
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Cluster/Gossip
>            Reporter: ANSHUL SAINI
>            Priority: Normal
>
> Trying to replace a down node the new nodes fail to start, using property 
> {_}*replace_address*{_}.
> Below message appears continuously in system logs.
> {noformat}
> WARN  [Messaging-EventLoop-3-2] 2023-08-16 14:18:58,565 NoSpamLogger.java:95 
> - /xxx.xxx.xxx.xxx:7000->/yyy.yyy.yyy.yyy:7000-URGENT_MESSAGES-[no-channel] 
> dropping message of type GOSSIP_DIGEST_SYN whose timeout expired before 
> reaching the network
> INFO  [Messaging-EventLoop-3-2] 2023-08-16 14:19:23,910 NoSpamLogger.java:92 
> - /xxx.xxx.xxx.xxx->/yyy.yyy.yyy.yyy:7000-URGENT_MESSAGES-[no-channel] failed 
> to connect
> io.netty.channel.ConnectTimeoutException: connection timed out: 
> /xxx.xxx.xxx.xxx:7000
>     at 
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:576)
>     at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
>     at 
> io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
>     at 
> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
>     at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
>     at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
>     at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>     at 
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>     at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>     at java.lang.Thread.run(Thread.java:748)
> {noformat}
>  
> {noformat}
> DEBUG [ScheduledTasks:1] 2023-08-16 19:09:56,919 StorageService.java:2399 - 
> Ignoring application state LOAD from /yyy.yyy.yyy.yyy:7000 because it is not 
> a member in token metadata 
> DEBUG [ScheduledTasks:1] 2023-08-16 19:10:56,920 StorageService.java:2399 - 
> Ignoring application state LOAD from /yyy.yyy.yyy.yyy:7000 because it is not 
> a member in token metadata 
>  {noformat}
> xxx.xxx.xxx.xxx - IP of down node yyy.yyy.yyy.yyy - IP of new node
> NO other ERROR/WARNING appears in logs. The node goes into UJ state, but 
> never joins the ring.
> While this doesn't happen always, but we are seeing this increased behaviour 
> since upgrading from 3.11.9 to 4.0.5.
> Configuration are all fine as to mitigate this we terminate the node and 
> spawn a new one with same configs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (CASSANDRA-18771) Cassandra 4.0.5 nodes fails to start when replacing dead node

Reply via email to