reconnects in log

Kan Maung (Jira) Mon, 10 Jun 2024 08:32:03 -0700


     [ 
https://issues.apache.org/jira/browse/CASSANDRA-19696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Kan Maung updated CASSANDRA-19696:
----------------------------------
    Description: 
We are seeing hundreds of InboundConnection established / closed messages on 
several of our clusters running Apache Cassandra 4.0.10.  Looking at 'nodetool 
tpstats' it seems gossip is close to the time out value.  It affects both the 
LargeMessage and UrgentMessage connections.

Gossiper uses MessagingService to send messages from the source to destination 
using OutboundConnection.  Depending on the message type especially for 
LARGE_MESSAGES it is enqueued in a separate thread pool while URGENT_MESSAGES 
are delivered with Verb.Priority.P0.

In the example below this happens just 20 seconds after it connected. These two 
nodes are in the same datacenter, so there should be no geographical latency 
between them. This cluster 111 has a very standard cassandra.yaml for our 
environment.

 

127.10.20.88 cassandra.log:

2024-05-13 02:06:13,805 [INFO ] [Messaging-EventLoop-3-2] cluster_id=111 
ip_address=127.10.20.88 InboundConnectionInitiator.java:529 - 
/127.10.30.171:7000(/127.10.30.171:37404)->/127.10.20.88:7000-URGENT_MESSAGES-e039a471
 messaging connection established, version = 12, framing = CRC, encryption = 
encrypted(...)

2024-05-13 02:06:32,201 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111 
ip_address=127.10.20.88 OutboundConnection.java:1059 - 
/127.10.20.88:7000->/169.73.115.189:7000-LARGE_MESSAGES-70634968 channel closed 
by provider

 

127.10.30.171 log:

2024-05-13 02:05:00,300 [INFO ] [Messaging-EventLoop-3-2] cluster_id=111 
ip_address=127.10.30.171 OutboundConnection.java:1059 - 
/127.10.30.171:7000->/169.102.147.87:7000-LARGE_MESSAGES-4b3ea69f channel 
closed by provider

io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: 
Connection timed out

2024-05-13 02:05:46,892 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111 
ip_address=127.10.30.171 OutboundConnection.java:1059 - 
/127.10.30.171:7000->/127.10.20.88:7000-URGENT_MESSAGES-8fd0dbf2 channel closed 
by provider

io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: 
Connection timed out

2024-05-13 02:06:13,804 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111 
ip_address=127.10.30.171 OutboundConnection.java:1153 - 
/127.10.30.171:7000(/127.10.30.171:37404)->/127.10.20.88:7000-URGENT_MESSAGES-155d9869
 successfully connected, version = 12, framing = CRC, encryption = 
encrypted(...)

2024-05-13 02:06:24,281 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111 
ip_address=127.10.30.171 OutboundConnection.java:1153 - 
/127.10.30.171:7000(/127.10.30.171:50046)->/169.73.137.223:7000-LARGE_MESSAGES-04b51284
 successfully connected, version = 12, framing = LZ4, encryption = 
encrypted(...)

  was:
We are seeing hundreds of InboundConnection established / closed messages on 
several of our clusters running Apache Cassandra 4.0.10. Looking at 'nodetool 
tpstats' it seems gossip is close to the time out value.

We are seeing hundreds of InboundConnection established / closed messages on 
several of our clusters running Apache Cassandra 4.0.10. Looking at 'nodetool 
tpstats' it seems gossip is close to the time out value.
It affects both the LargeMessage and UrgentMessage connections.

 


In the example below this happens just 20 seconds after it connected. These two 
nodes are in the same datacenter, so there should be no geographical latency 
between them. This cluster 111 has a very standard cassandra.yaml for our 
environment.

Gossiper uses MessagingService to send messages from the source to destination 
using OutboundConnection.

Depending on the message type especially for LARGE_MESSAGES it is enqueued in a 
separate thread pool while URGENT_MESSAGES are delivered with Verb.Priority.P0.


127.10.20.88 cassandra.log:

2024-05-13 02:06:13,805 [INFO ] [Messaging-EventLoop-3-2] cluster_id=111 
ip_address=127.10.20.88 InboundConnectionInitiator.java:529 - 
/127.10.30.171:7000(/127.10.30.171:37404)->/127.10.20.88:7000-URGENT_MESSAGES-e039a471
 messaging connection established, version = 12, framing = CRC, encryption = 
encrypted(...)

2024-05-13 02:06:32,201 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111 
ip_address=127.10.20.88 OutboundConnection.java:1059 - 
/127.10.20.88:7000->/169.73.115.189:7000-LARGE_MESSAGES-70634968 channel closed 
by provider

 


127.10.30.171 log:

2024-05-13 02:05:00,300 [INFO ] [Messaging-EventLoop-3-2] cluster_id=111 
ip_address=127.10.30.171 OutboundConnection.java:1059 - 
/127.10.30.171:7000->/169.102.147.87:7000-LARGE_MESSAGES-4b3ea69f channel 
closed by provider

io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: 
Connection timed out

2024-05-13 02:05:46,892 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111 
ip_address=127.10.30.171 OutboundConnection.java:1059 - 
/127.10.30.171:7000->/127.10.20.88:7000-URGENT_MESSAGES-8fd0dbf2 channel closed 
by provider

io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: 
Connection timed out

2024-05-13 02:06:13,804 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111 
ip_address=127.10.30.171 OutboundConnection.java:1153 - 
/127.10.30.171:7000(/127.10.30.171:37404)->/127.10.20.88:7000-URGENT_MESSAGES-155d9869
 successfully connected, version = 12, framing = CRC, encryption = 
encrypted(...)

2024-05-13 02:06:24,281 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111 
ip_address=127.10.30.171 OutboundConnection.java:1153 - 
/127.10.30.171:7000(/127.10.30.171:50046)->/169.73.137.223:7000-LARGE_MESSAGES-04b51284
 successfully connected, version = 12, framing = LZ4, encryption = 
encrypted(...)


> Observed large number of Inbound / Outbound connection disconnect / 
> reconnects in log
> -------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-19696
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19696
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Kan Maung
>            Priority: Normal
>
> We are seeing hundreds of InboundConnection established / closed messages on 
> several of our clusters running Apache Cassandra 4.0.10.  Looking at 
> 'nodetool tpstats' it seems gossip is close to the time out value.  It 
> affects both the LargeMessage and UrgentMessage connections.
> Gossiper uses MessagingService to send messages from the source to 
> destination using OutboundConnection.  Depending on the message type 
> especially for LARGE_MESSAGES it is enqueued in a separate thread pool while 
> URGENT_MESSAGES are delivered with Verb.Priority.P0.
> In the example below this happens just 20 seconds after it connected. These 
> two nodes are in the same datacenter, so there should be no geographical 
> latency between them. This cluster 111 has a very standard cassandra.yaml for 
> our environment.
>  
> 127.10.20.88 cassandra.log:
> 2024-05-13 02:06:13,805 [INFO ] [Messaging-EventLoop-3-2] cluster_id=111 
> ip_address=127.10.20.88 InboundConnectionInitiator.java:529 - 
> /127.10.30.171:7000(/127.10.30.171:37404)->/127.10.20.88:7000-URGENT_MESSAGES-e039a471
>  messaging connection established, version = 12, framing = CRC, encryption = 
> encrypted(...)
> 2024-05-13 02:06:32,201 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111 
> ip_address=127.10.20.88 OutboundConnection.java:1059 - 
> /127.10.20.88:7000->/169.73.115.189:7000-LARGE_MESSAGES-70634968 channel 
> closed by provider
>  
> 127.10.30.171 log:
> 2024-05-13 02:05:00,300 [INFO ] [Messaging-EventLoop-3-2] cluster_id=111 
> ip_address=127.10.30.171 OutboundConnection.java:1059 - 
> /127.10.30.171:7000->/169.102.147.87:7000-LARGE_MESSAGES-4b3ea69f channel 
> closed by provider
> io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: 
> Connection timed out
> 2024-05-13 02:05:46,892 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111 
> ip_address=127.10.30.171 OutboundConnection.java:1059 - 
> /127.10.30.171:7000->/127.10.20.88:7000-URGENT_MESSAGES-8fd0dbf2 channel 
> closed by provider
> io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: 
> Connection timed out
> 2024-05-13 02:06:13,804 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111 
> ip_address=127.10.30.171 OutboundConnection.java:1153 - 
> /127.10.30.171:7000(/127.10.30.171:37404)->/127.10.20.88:7000-URGENT_MESSAGES-155d9869
>  successfully connected, version = 12, framing = CRC, encryption = 
> encrypted(...)
> 2024-05-13 02:06:24,281 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111 
> ip_address=127.10.30.171 OutboundConnection.java:1153 - 
> /127.10.30.171:7000(/127.10.30.171:50046)->/169.73.137.223:7000-LARGE_MESSAGES-04b51284
>  successfully connected, version = 12, framing = LZ4, encryption = 
> encrypted(...)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (CASSANDRA-19696) Observed large number of Inbound / Outbound connection disconnect / reconnects in log

Reply via email to