[
https://issues.apache.org/jira/browse/CASSANDRA-18771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ANSHUL SAINI updated CASSANDRA-18771:
-------------------------------------
Description:
Trying to replace a down node the new nodes fail to start, using property
{_}*replace_address*{_}.
Below message appears continuously in system logs.
{noformat}
WARN [Messaging-EventLoop-3-2] 2023-08-16 14:18:58,565 NoSpamLogger.java:95 -
/xxx.xxx.xxx.xxx:7000->/yyy.yyy.yyy.yyy:7000-URGENT_MESSAGES-[no-channel]
dropping message of type GOSSIP_DIGEST_SYN whose timeout expired before
reaching the network
INFO [Messaging-EventLoop-3-2] 2023-08-16 14:19:23,910 NoSpamLogger.java:92 -
/xxx.xxx.xxx.xxx->/yyy.yyy.yyy.yyy:7000-URGENT_MESSAGES-[no-channel] failed to
connect
io.netty.channel.ConnectTimeoutException: connection timed out:
/xxx.xxx.xxx.xxx:7000
at
io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:576)
at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
at
io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
at
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
at
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
at
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
{noformat}
{noformat}
DEBUG [ScheduledTasks:1] 2023-08-16 19:09:56,919 StorageService.java:2399 -
Ignoring application state LOAD from /yyy.yyy.yyy.yyy:7000 because it is not a
member in token metadata
DEBUG [ScheduledTasks:1] 2023-08-16 19:10:56,920 StorageService.java:2399 -
Ignoring application state LOAD from /yyy.yyy.yyy.yyy:7000 because it is not a
member in token metadata
{noformat}
xxx.xxx.xxx.xxx - IP of down node yyy.yyy.yyy.yyy - IP of new node
NO other ERROR/WARNING appears in logs. The node goes into UJ state, but never
joins the ring.
While this doesn't happen always, but we are seeing this increased behaviour
since upgrading from 3.11.9 to 4.0.5.
Configuration are all fine as to mitigate this we terminate the node and spawn
a new one with same configs.
was:
Trying to replace a down node the new nodes fail to start, using property
{_}*replace_address*{_}.
Below message appears continuously in system logs.
{noformat}
WARN [Messaging-EventLoop-3-2] 2023-08-16 14:18:58,565 NoSpamLogger.java:95 -
/xxx.xxx.xxx.xxx:7000->/yyy.yyy.yyy.yyy:7000-URGENT_MESSAGES-[no-channel]
dropping message of type GOSSIP_DIGEST_SYN whose timeout expired before
reaching the network
INFO [Messaging-EventLoop-3-2] 2023-08-16 14:19:23,910 NoSpamLogger.java:92 -
/xxx.xxx.xxx.xxx->/yyy.yyy.yyy.yyy:7000-URGENT_MESSAGES-[no-channel] failed to
connect
io.netty.channel.ConnectTimeoutException: connection timed out:
/xxx.xxx.xxx.xxx:7000
at
io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:576)
at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
at
io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
at
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
at
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
at
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
{noformat}
{noformat}
DEBUG [ScheduledTasks:1] 2023-08-16 19:08:56,919 StorageService.java:2399 -
Ignoring application state LOAD from /10.133.66.188:7000 because it is not a
member in token metadata DEBUG [ScheduledTasks:1] 2023-08-16 19:09:56,919
StorageService.java:2399 - Ignoring application state LOAD from
/10.133.66.188:7000 because it is not a member in token metadata DEBUG
[ScheduledTasks:1] 2023-08-16 19:10:56,920 StorageService.java:2399 - Ignoring
application state LOAD from /10.133.66.188:7000 because it is not a member in
token metadata DEBUG [ScheduledTasks:1] 2023-08-16 19:11:56,920
StorageService.java:2399 - Ignoring application state LOAD from
/10.133.66.188:7000 because it is not a member in token metadata
{noformat}
xxx.xxx.xxx.xxx - IP of down node yyy.yyy.yyy.yyy - IP of new node
NO other ERROR/WARNING appears in logs. The node goes into UJ state, but never
joins the ring.
While this doesn't happen always, but we are seeing this increased behaviour
since upgrading from 3.11.9 to 4.0.5.
Configuration are all fine as to mitigate this we terminate the node and spawn
a new one with same configs.
> Cassandra 4.0.5 nodes fails to start when replacing dead node
> -------------------------------------------------------------
>
> Key: CASSANDRA-18771
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18771
> Project: Cassandra
> Issue Type: Bug
> Components: Cluster/Gossip
> Reporter: ANSHUL SAINI
> Priority: Normal
>
> Trying to replace a down node the new nodes fail to start, using property
> {_}*replace_address*{_}.
> Below message appears continuously in system logs.
> {noformat}
> WARN [Messaging-EventLoop-3-2] 2023-08-16 14:18:58,565 NoSpamLogger.java:95
> - /xxx.xxx.xxx.xxx:7000->/yyy.yyy.yyy.yyy:7000-URGENT_MESSAGES-[no-channel]
> dropping message of type GOSSIP_DIGEST_SYN whose timeout expired before
> reaching the network
> INFO [Messaging-EventLoop-3-2] 2023-08-16 14:19:23,910 NoSpamLogger.java:92
> - /xxx.xxx.xxx.xxx->/yyy.yyy.yyy.yyy:7000-URGENT_MESSAGES-[no-channel] failed
> to connect
> io.netty.channel.ConnectTimeoutException: connection timed out:
> /xxx.xxx.xxx.xxx:7000
> at
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:576)
> at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
> at
> io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
> at
> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
> at
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
> at
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
> at
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
>
> {noformat}
> DEBUG [ScheduledTasks:1] 2023-08-16 19:09:56,919 StorageService.java:2399 -
> Ignoring application state LOAD from /yyy.yyy.yyy.yyy:7000 because it is not
> a member in token metadata
> DEBUG [ScheduledTasks:1] 2023-08-16 19:10:56,920 StorageService.java:2399 -
> Ignoring application state LOAD from /yyy.yyy.yyy.yyy:7000 because it is not
> a member in token metadata
> {noformat}
> xxx.xxx.xxx.xxx - IP of down node yyy.yyy.yyy.yyy - IP of new node
> NO other ERROR/WARNING appears in logs. The node goes into UJ state, but
> never joins the ring.
> While this doesn't happen always, but we are seeing this increased behaviour
> since upgrading from 3.11.9 to 4.0.5.
> Configuration are all fine as to mitigate this we terminate the node and
> spawn a new one with same configs.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]