[ 
https://issues.apache.org/jira/browse/CASSANDRA-19941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Goodness Ayinmode updated CASSANDRA-19941:
------------------------------------------
    Description: To execute the gossip protocol and exchange state info with 
other nodes, 
_[GossiperTask.run()|https://github.com/apache/cassandra/blob/7b1eb1f0b717beb33e611157766701cd71e4ad8c/src/java/org/apache/cassandra/gms/Gossiper.java#L321]_
 invokes 
{_}_[doGossipToLiveMember|https://github.com/apache/cassandra/blob/7b1eb1f0b717beb33e611157766701cd71e4ad8c/src/java/org/apache/cassandra/gms/Gossiper.java#L955],
 
maybeGossipToUnreachableMember|https://github.com/apache/cassandra/blob/7b1eb1f0b717beb33e611157766701cd71e4ad8c/src/java/org/apache/cassandra/gms/Gossiper.java#L964{_},
 and[ 
{_}maybeGossipToSeed{_}|https://github.com/apache/cassandra/blob/7b1eb1f0b717beb33e611157766701cd71e4ad8c/src/java/org/apache/cassandra/gms/Gossiper.java#L982],
 with all 3 methods invoking[ 
{_}sendGossip(){_}|https://github.com/apache/cassandra/blob/7b1eb1f0b717beb33e611157766701cd71e4ad8c/src/java/org/apache/cassandra/gms/Gossiper.java#L933]
 to send a gossip message to a randomly selected endpoint. The interaction 
between GossiperTask.run() and sendGossip() creates a potential synchronization 
bottleneck due to the lock 
([_taskLock_|https://github.com/apache/cassandra/blob/7b1eb1f0b717beb33e611157766701cd71e4ad8c/src/java/org/apache/cassandra/gms/Gossiper.java#L328])
 being held during network-bound operations. GossiperTask.run() directly calls 
[_waitUntilListening()_|https://github.com/apache/cassandra/blob/7b1eb1f0b717beb33e611157766701cd71e4ad8c/src/java/org/apache/cassandra/gms/Gossiper.java#L326]
 which will wait for the MessagingService to start listening, but there could 
be delays if the messaging service is slow to start or has issues. Also, if 
sendGossip() encounters network-related delays (i.e. network latency, timeouts, 
slow or unresponsive nodes) when there is a large number of nodes, the taskLock 
could be held for longer periods,  possibly increasing the risk of a backlog of 
waiting threads (if delays are frequent) and also affecting the scheduling of 
subsequent tasks.   (was: To execute the gossip protocol and exchange state 
info with other nodes, 
_[GossiperTask.run()|https://github.com/apache/cassandra/blob/7b1eb1f0b717beb33e611157766701cd71e4ad8c/src/java/org/apache/cassandra/gms/Gossiper.java#L321]_
 invokes 
{_}[doGossipToLiveMember|https://github.com/apache/cassandra/blob/7b1eb1f0b717beb33e611157766701cd71e4ad8c/src/java/org/apache/cassandra/gms/Gossiper.java#L955],
 
maybeGossipToUnreachableMember|https://github.com/apache/cassandra/blob/7b1eb1f0b717beb33e611157766701cd71e4ad8c/src/java/org/apache/cassandra/gms/Gossiper.java#L964{_},
 and[ 
{_}maybeGossipToSeed{_}|https://github.com/apache/cassandra/blob/7b1eb1f0b717beb33e611157766701cd71e4ad8c/src/java/org/apache/cassandra/gms/Gossiper.java#L982],
 with all 3 methods invoking[ 
{_}sendGossip(){_}|https://github.com/apache/cassandra/blob/7b1eb1f0b717beb33e611157766701cd71e4ad8c/src/java/org/apache/cassandra/gms/Gossiper.java#L933]
 to send a gossip message to a randomly selected endpoint. The interaction 
between GossiperTask.run() and sendGossip() creates a potential synchronization 
bottleneck due to the lock 
([_taskLock_|https://github.com/apache/cassandra/blob/7b1eb1f0b717beb33e611157766701cd71e4ad8c/src/java/org/apache/cassandra/gms/Gossiper.java#L328])
 being held during network-bound operations. GossiperTask.run() directly calls 
[_waitUntilListening()_|https://github.com/apache/cassandra/blob/7b1eb1f0b717beb33e611157766701cd71e4ad8c/src/java/org/apache/cassandra/gms/Gossiper.java#L326]
 which will wait for the MessagingService to start listening, but there could 
be delays if the messaging service is slow to start or has issues. Also, if 
sendGossip() encounters network-related delays (i.e. network latency, timeouts, 
slow or unresponsive nodes) when there is a large number of nodes, the taskLock 
could be held for longer periods,  possibly increasing the risk of a backlog of 
waiting threads (if delays are frequent) and also affecting the scheduling of 
subsequent tasks. )

> Move network operations outside the lock in Gossiper$GossipTask
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-19941
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19941
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Cluster/Gossip
>            Reporter: Goodness Ayinmode
>            Priority: Normal
>
> To execute the gossip protocol and exchange state info with other nodes, 
> _[GossiperTask.run()|https://github.com/apache/cassandra/blob/7b1eb1f0b717beb33e611157766701cd71e4ad8c/src/java/org/apache/cassandra/gms/Gossiper.java#L321]_
>  invokes 
> {_}_[doGossipToLiveMember|https://github.com/apache/cassandra/blob/7b1eb1f0b717beb33e611157766701cd71e4ad8c/src/java/org/apache/cassandra/gms/Gossiper.java#L955],
>  
> maybeGossipToUnreachableMember|https://github.com/apache/cassandra/blob/7b1eb1f0b717beb33e611157766701cd71e4ad8c/src/java/org/apache/cassandra/gms/Gossiper.java#L964{_},
>  and[ 
> {_}maybeGossipToSeed{_}|https://github.com/apache/cassandra/blob/7b1eb1f0b717beb33e611157766701cd71e4ad8c/src/java/org/apache/cassandra/gms/Gossiper.java#L982],
>  with all 3 methods invoking[ 
> {_}sendGossip(){_}|https://github.com/apache/cassandra/blob/7b1eb1f0b717beb33e611157766701cd71e4ad8c/src/java/org/apache/cassandra/gms/Gossiper.java#L933]
>  to send a gossip message to a randomly selected endpoint. The interaction 
> between GossiperTask.run() and sendGossip() creates a potential 
> synchronization bottleneck due to the lock 
> ([_taskLock_|https://github.com/apache/cassandra/blob/7b1eb1f0b717beb33e611157766701cd71e4ad8c/src/java/org/apache/cassandra/gms/Gossiper.java#L328])
>  being held during network-bound operations. GossiperTask.run() directly 
> calls 
> [_waitUntilListening()_|https://github.com/apache/cassandra/blob/7b1eb1f0b717beb33e611157766701cd71e4ad8c/src/java/org/apache/cassandra/gms/Gossiper.java#L326]
>  which will wait for the MessagingService to start listening, but there could 
> be delays if the messaging service is slow to start or has issues. Also, if 
> sendGossip() encounters network-related delays (i.e. network latency, 
> timeouts, slow or unresponsive nodes) when there is a large number of nodes, 
> the taskLock could be held for longer periods,  possibly increasing the risk 
> of a backlog of waiting threads (if delays are frequent) and also affecting 
> the scheduling of subsequent tasks. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to