[
https://issues.apache.org/jira/browse/CASSANDRA-19696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kan Maung updated CASSANDRA-19696:
----------------------------------
Description:
We are seeing hundreds of InboundConnection established / closed messages on
several of our clusters running Apache Cassandra 4.0.10. Looking at 'nodetool
tpstats' it seems gossip is close to the time out value. It affects both the
LargeMessage and UrgentMessage connections.
Gossiper uses MessagingService to send messages from the source to destination
using OutboundConnection. Depending on the message type especially for
LARGE_MESSAGES it is enqueued in a separate thread pool while URGENT_MESSAGES
are delivered with Verb.Priority.P0.
In the example below this happens just 20 seconds after it connected. These two
nodes are in the same datacenter, so there should be no geographical latency
between them. This cluster 111 has a very standard cassandra.yaml for our
environment.
127.10.20.88 cassandra.log:
2024-05-13 02:06:13,805 [INFO ] [Messaging-EventLoop-3-2] cluster_id=111
ip_address=127.10.20.88 InboundConnectionInitiator.java:529 -
/127.10.30.171:7000(/127.10.30.171:37404)->/127.10.20.88:7000-URGENT_MESSAGES-e039a471
messaging connection established, version = 12, framing = CRC, encryption =
encrypted(...)
2024-05-13 02:06:32,201 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111
ip_address=127.10.20.88 OutboundConnection.java:1059 -
/127.10.20.88:7000->/169.73.115.189:7000-LARGE_MESSAGES-70634968 channel closed
by provider
127.10.30.171 log:
2024-05-13 02:05:00,300 [INFO ] [Messaging-EventLoop-3-2] cluster_id=111
ip_address=127.10.30.171 OutboundConnection.java:1059 -
/127.10.30.171:7000->/169.102.147.87:7000-LARGE_MESSAGES-4b3ea69f channel
closed by provider
io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed:
Connection timed out
2024-05-13 02:05:46,892 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111
ip_address=127.10.30.171 OutboundConnection.java:1059 -
/127.10.30.171:7000->/127.10.20.88:7000-URGENT_MESSAGES-8fd0dbf2 channel closed
by provider
io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed:
Connection timed out
2024-05-13 02:06:13,804 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111
ip_address=127.10.30.171 OutboundConnection.java:1153 -
/127.10.30.171:7000(/127.10.30.171:37404)->/127.10.20.88:7000-URGENT_MESSAGES-155d9869
successfully connected, version = 12, framing = CRC, encryption =
encrypted(...)
2024-05-13 02:06:24,281 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111
ip_address=127.10.30.171 OutboundConnection.java:1153 -
/127.10.30.171:7000(/127.10.30.171:50046)->/169.73.137.223:7000-LARGE_MESSAGES-04b51284
successfully connected, version = 12, framing = LZ4, encryption =
encrypted(...)
was:
We are seeing hundreds of InboundConnection established / closed messages on
several of our clusters running Apache Cassandra 4.0.10. Looking at 'nodetool
tpstats' it seems gossip is close to the time out value.
We are seeing hundreds of InboundConnection established / closed messages on
several of our clusters running Apache Cassandra 4.0.10. Looking at 'nodetool
tpstats' it seems gossip is close to the time out value.
It affects both the LargeMessage and UrgentMessage connections.
In the example below this happens just 20 seconds after it connected. These two
nodes are in the same datacenter, so there should be no geographical latency
between them. This cluster 111 has a very standard cassandra.yaml for our
environment.
Gossiper uses MessagingService to send messages from the source to destination
using OutboundConnection.
Depending on the message type especially for LARGE_MESSAGES it is enqueued in a
separate thread pool while URGENT_MESSAGES are delivered with Verb.Priority.P0.
127.10.20.88 cassandra.log:
2024-05-13 02:06:13,805 [INFO ] [Messaging-EventLoop-3-2] cluster_id=111
ip_address=127.10.20.88 InboundConnectionInitiator.java:529 -
/127.10.30.171:7000(/127.10.30.171:37404)->/127.10.20.88:7000-URGENT_MESSAGES-e039a471
messaging connection established, version = 12, framing = CRC, encryption =
encrypted(...)
2024-05-13 02:06:32,201 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111
ip_address=127.10.20.88 OutboundConnection.java:1059 -
/127.10.20.88:7000->/169.73.115.189:7000-LARGE_MESSAGES-70634968 channel closed
by provider
127.10.30.171 log:
2024-05-13 02:05:00,300 [INFO ] [Messaging-EventLoop-3-2] cluster_id=111
ip_address=127.10.30.171 OutboundConnection.java:1059 -
/127.10.30.171:7000->/169.102.147.87:7000-LARGE_MESSAGES-4b3ea69f channel
closed by provider
io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed:
Connection timed out
2024-05-13 02:05:46,892 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111
ip_address=127.10.30.171 OutboundConnection.java:1059 -
/127.10.30.171:7000->/127.10.20.88:7000-URGENT_MESSAGES-8fd0dbf2 channel closed
by provider
io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed:
Connection timed out
2024-05-13 02:06:13,804 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111
ip_address=127.10.30.171 OutboundConnection.java:1153 -
/127.10.30.171:7000(/127.10.30.171:37404)->/127.10.20.88:7000-URGENT_MESSAGES-155d9869
successfully connected, version = 12, framing = CRC, encryption =
encrypted(...)
2024-05-13 02:06:24,281 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111
ip_address=127.10.30.171 OutboundConnection.java:1153 -
/127.10.30.171:7000(/127.10.30.171:50046)->/169.73.137.223:7000-LARGE_MESSAGES-04b51284
successfully connected, version = 12, framing = LZ4, encryption =
encrypted(...)
> Observed large number of Inbound / Outbound connection disconnect /
> reconnects in log
> -------------------------------------------------------------------------------------
>
> Key: CASSANDRA-19696
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19696
> Project: Cassandra
> Issue Type: Bug
> Reporter: Kan Maung
> Priority: Normal
>
> We are seeing hundreds of InboundConnection established / closed messages on
> several of our clusters running Apache Cassandra 4.0.10. Looking at
> 'nodetool tpstats' it seems gossip is close to the time out value. It
> affects both the LargeMessage and UrgentMessage connections.
> Gossiper uses MessagingService to send messages from the source to
> destination using OutboundConnection. Depending on the message type
> especially for LARGE_MESSAGES it is enqueued in a separate thread pool while
> URGENT_MESSAGES are delivered with Verb.Priority.P0.
> In the example below this happens just 20 seconds after it connected. These
> two nodes are in the same datacenter, so there should be no geographical
> latency between them. This cluster 111 has a very standard cassandra.yaml for
> our environment.
>
> 127.10.20.88 cassandra.log:
> 2024-05-13 02:06:13,805 [INFO ] [Messaging-EventLoop-3-2] cluster_id=111
> ip_address=127.10.20.88 InboundConnectionInitiator.java:529 -
> /127.10.30.171:7000(/127.10.30.171:37404)->/127.10.20.88:7000-URGENT_MESSAGES-e039a471
> messaging connection established, version = 12, framing = CRC, encryption =
> encrypted(...)
> 2024-05-13 02:06:32,201 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111
> ip_address=127.10.20.88 OutboundConnection.java:1059 -
> /127.10.20.88:7000->/169.73.115.189:7000-LARGE_MESSAGES-70634968 channel
> closed by provider
>
> 127.10.30.171 log:
> 2024-05-13 02:05:00,300 [INFO ] [Messaging-EventLoop-3-2] cluster_id=111
> ip_address=127.10.30.171 OutboundConnection.java:1059 -
> /127.10.30.171:7000->/169.102.147.87:7000-LARGE_MESSAGES-4b3ea69f channel
> closed by provider
> io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed:
> Connection timed out
> 2024-05-13 02:05:46,892 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111
> ip_address=127.10.30.171 OutboundConnection.java:1059 -
> /127.10.30.171:7000->/127.10.20.88:7000-URGENT_MESSAGES-8fd0dbf2 channel
> closed by provider
> io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed:
> Connection timed out
> 2024-05-13 02:06:13,804 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111
> ip_address=127.10.30.171 OutboundConnection.java:1153 -
> /127.10.30.171:7000(/127.10.30.171:37404)->/127.10.20.88:7000-URGENT_MESSAGES-155d9869
> successfully connected, version = 12, framing = CRC, encryption =
> encrypted(...)
> 2024-05-13 02:06:24,281 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111
> ip_address=127.10.30.171 OutboundConnection.java:1153 -
> /127.10.30.171:7000(/127.10.30.171:50046)->/169.73.137.223:7000-LARGE_MESSAGES-04b51284
> successfully connected, version = 12, framing = LZ4, encryption =
> encrypted(...)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]