[ 
https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17765429#comment-17765429
 ] 

Cameron Zemek commented on CASSANDRA-18845:
-------------------------------------------

Need to-do more investigating around the slowness. I suspect its due to the 
flood of gossip messages on startup. The previous patch CASSANDRA-18543 removed 
the duplicate ECHO messages to cut down on this.

The behavior I notice happening in production though is there a large initial 
delay (> 10 seconds) for any nodes to be marked as `is now UP` then it floods 
in. On large clusters this takes over a minute to complete receiving them all. 
Prior to  CASSANDRA-18543 it never checked liveSize at all and so would start 
up regardless of UP status of nodes. With that change assuming the polling 
starts as UP status are received it waits. So the problem now is waiting for 
that initial event.

The previous patch from CASSANDRA-18543 allowed for overriding the gossip 
parameters but in hindsight it's difficult to determine a suitable default for 
that initial wait as its not consistent. The algorithm in waitToSettle relies 
on seeing a change in these values, so that initial delay if greater than the 
wait time plus the polling phase will move on and start NTR even though we have 
yet to see any nodes as UP.

You are correct that even with this proposed patch it's possible to still start 
NTR too early. Eg, if one node reports UP but the delay for the next event is 
longer than the polling period, but I am not seeing that in production so far. 
Therefore, the purpose of this patch is to have it wait for the first `is now 
UP` from a node instead of relying on cassandra.gossip_settle_min_wait_ms

> Waiting for gossip to settle on live endpoints
> ----------------------------------------------
>
>                 Key: CASSANDRA-18845
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18845
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Cameron Zemek
>            Priority: Normal
>         Attachments: 18845-3.11.patch, 18845-4.0.patch, 18845-4.1.patch, 
> 18845-5.0.patch, image-2023-09-14-11-16-23-020.png
>
>
> This is a follow up to CASSANDRA-18543
> Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms 
> this is tedious and error prone. On a node just observed a 79 second gap 
> between waiting for gossip and the first echo response to indicate a node is 
> UP.
> The problem being that do not want to start Native Transport until gossip 
> settles otherwise queries can fail consistency such as LOCAL_QUORUM as it 
> thinks the replicas are still in DOWN state.
> Instead of having to set gossip_settle_min_wait_ms I am proposing that 
> (outside single node cluster) wait for UP message from another node before 
> considering gossip as settled. Eg.
> {code:java}
>             if (currentSize == epSize && currentLive == liveSize && liveSize 
> > 1)
>             {
>                 logger.debug("Gossip looks settled.");
>                 numOkay++;
>             } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to