[
https://issues.apache.org/jira/browse/CASSANDRA-19696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17853712#comment-17853712
]
Brandon Williams commented on CASSANDRA-19696:
----------------------------------------------
Problems with your network would explain these issues. 4.0 has been released
for some time without any similar reports.
> Observed large number of Inbound / Outbound connection disconnect /
> reconnects in log
> -------------------------------------------------------------------------------------
>
> Key: CASSANDRA-19696
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19696
> Project: Cassandra
> Issue Type: Bug
> Components: Cluster/Gossip
> Reporter: Kan Maung
> Priority: Normal
>
> We are seeing hundreds of InboundConnection established / closed messages on
> several of our clusters running Apache Cassandra 4.0.10. Looking at
> 'nodetool tpstats' it seems gossip is close to the time out value. It
> affects both the LargeMessage and UrgentMessage connections.
> Gossiper uses MessagingService to send messages from the source to
> destination using OutboundConnection. Depending on the message type
> especially for LARGE_MESSAGES it is enqueued in a separate thread pool while
> URGENT_MESSAGES are delivered with Verb.Priority.P0.
> In the example below this happens just 20 seconds after it connected. These
> two nodes are in the same datacenter, so there should be no geographical
> latency between them. This cluster 111 has a very standard cassandra.yaml for
> our environment.
>
> 127.10.20.88 cassandra.log:
> 2024-05-13 02:06:13,805 [INFO ] [Messaging-EventLoop-3-2] cluster_id=111
> ip_address=127.10.20.88 InboundConnectionInitiator.java:529 -
> /127.10.30.171:7000(/127.10.30.171:37404)->/127.10.20.88:7000-URGENT_MESSAGES-e039a471
> messaging connection established, version = 12, framing = CRC, encryption =
> encrypted(...)
> 2024-05-13 02:06:32,201 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111
> ip_address=127.10.20.88 OutboundConnection.java:1059 -
> /127.10.20.88:7000->/169.73.115.189:7000-LARGE_MESSAGES-70634968 channel
> closed by provider
>
> 127.10.30.171 log:
> 2024-05-13 02:05:00,300 [INFO ] [Messaging-EventLoop-3-2] cluster_id=111
> ip_address=127.10.30.171 OutboundConnection.java:1059 -
> /127.10.30.171:7000->/169.102.147.87:7000-LARGE_MESSAGES-4b3ea69f channel
> closed by provider
> io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed:
> Connection timed out
> 2024-05-13 02:05:46,892 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111
> ip_address=127.10.30.171 OutboundConnection.java:1059 -
> /127.10.30.171:7000->/127.10.20.88:7000-URGENT_MESSAGES-8fd0dbf2 channel
> closed by provider
> io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed:
> Connection timed out
> 2024-05-13 02:06:13,804 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111
> ip_address=127.10.30.171 OutboundConnection.java:1153 -
> /127.10.30.171:7000(/127.10.30.171:37404)->/127.10.20.88:7000-URGENT_MESSAGES-155d9869
> successfully connected, version = 12, framing = CRC, encryption =
> encrypted(...)
> 2024-05-13 02:06:24,281 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111
> ip_address=127.10.30.171 OutboundConnection.java:1153 -
> /127.10.30.171:7000(/127.10.30.171:50046)->/169.73.137.223:7000-LARGE_MESSAGES-04b51284
> successfully connected, version = 12, framing = LZ4, encryption =
> encrypted(...)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]