[ 
https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17772415#comment-17772415
 ] 

Cameron Zemek commented on CASSANDRA-18845:
-------------------------------------------

I have reworked the patch into pull request here: [Wait for live endpoints as 
part of waiting for gossip to settle by grom358 · Pull Request #2778 · 
apache/cassandra (github.com)|https://github.com/apache/cassandra/pull/2778]. 
Created the PR against 4.1 since 5.x is not as stable.

Still have not got around to making an automated test for this yet. It has the 
following behaviors:
 * Must opt-in by setting cassandra.gossip_settle_wait_live_max
 * Waits up to maximum number of polls defined by 
cassandra.gossip_settle_wait_live_max . Set to -1 to wait indefinitely.
 * cassandra.skip_wait_for_gossip_to_settle still applies to cap the maximum 
number of polls.
 * cassandra.gossip_settle_wait_live_required determines how many polls in a 
row without change to live endpoint state to consider gossip as settled once 
opt-in via cassandra.gossip_settle_wait_live_max
 * If live endpoint size equals number of endpoints, consider live endpoints as 
settled.
 * Requires at least 1 other live endpoint to begin considering live endpoints 
as settled.

Scenarios considered:
 * One node cluster. Will skip this check since epSize == liveSize
 * Entire cluster is down and starting up a node. Will wait 
cassandra.gossip_settle_wait_live_max polls
 * Restarting a node when another node is down. Will wait 
cassandra.gossip_settle_wait_live_required polls
 * On rare occasions it takes a while to see another node as UP. This is 
covered by requiring at least 1 other endpoint as up `liveSize > 1` to start 
the settlement process.

Being opt-in, this doesn't break any existing tests. This is also easier to use 
then the reverted patch as you just need to set 
cassandra.gossip_settle_wait_live_max . To restate the purpose of this patch is 
to resolve Native-Transport-Request starting before Cassandra has finished ECHO 
requests to other nodes. This results in requests failing LOCAL_QUORUM/QUORUM 
consistency as the endpoints are not considered live for purposes of executing 
requests.

This is coming up every time we are rolling restarting large clusters when 
doing security patches and other such operations. So typically, only allow a 
single node to be down at a time. With this Pull Request the waiting for live 
endpoints ends once all endpoints are UP and so this allows for minimizing time 
to perform rolling restarts while avoiding failed queries and affecting clients.

> Waiting for gossip to settle on live endpoints
> ----------------------------------------------
>
>                 Key: CASSANDRA-18845
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18845
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Cameron Zemek
>            Priority: Normal
>         Attachments: 18845-seperate.patch, delay.log, example.log, 
> image-2023-09-14-11-16-23-020.png, stream.log, test1.log, test2.log, test3.log
>
>
> This is a follow up to CASSANDRA-18543
> Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms 
> this is tedious and error prone. On a node just observed a 79 second gap 
> between waiting for gossip and the first echo response to indicate a node is 
> UP.
> The problem being that do not want to start Native Transport until gossip 
> settles otherwise queries can fail consistency such as LOCAL_QUORUM as it 
> thinks the replicas are still in DOWN state.
> Instead of having to set gossip_settle_min_wait_ms I am proposing that 
> (outside single node cluster) wait for UP message from another node before 
> considering gossip as settled. Eg.
> {code:java}
>             if (currentSize == epSize && currentLive == liveSize && liveSize 
> > 1)
>             {
>                 logger.debug("Gossip looks settled.");
>                 numOkay++;
>             } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to