[
https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17772415#comment-17772415
]
Cameron Zemek commented on CASSANDRA-18845:
-------------------------------------------
I have reworked the patch into pull request here: [Wait for live endpoints as
part of waiting for gossip to settle by grom358 · Pull Request #2778 ·
apache/cassandra (github.com)|https://github.com/apache/cassandra/pull/2778].
Created the PR against 4.1 since 5.x is not as stable.
Still have not got around to making an automated test for this yet. It has the
following behaviors:
* Must opt-in by setting cassandra.gossip_settle_wait_live_max
* Waits up to maximum number of polls defined by
cassandra.gossip_settle_wait_live_max . Set to -1 to wait indefinitely.
* cassandra.skip_wait_for_gossip_to_settle still applies to cap the maximum
number of polls.
* cassandra.gossip_settle_wait_live_required determines how many polls in a
row without change to live endpoint state to consider gossip as settled once
opt-in via cassandra.gossip_settle_wait_live_max
* If live endpoint size equals number of endpoints, consider live endpoints as
settled.
* Requires at least 1 other live endpoint to begin considering live endpoints
as settled.
Scenarios considered:
* One node cluster. Will skip this check since epSize == liveSize
* Entire cluster is down and starting up a node. Will wait
cassandra.gossip_settle_wait_live_max polls
* Restarting a node when another node is down. Will wait
cassandra.gossip_settle_wait_live_required polls
* On rare occasions it takes a while to see another node as UP. This is
covered by requiring at least 1 other endpoint as up `liveSize > 1` to start
the settlement process.
Being opt-in, this doesn't break any existing tests. This is also easier to use
then the reverted patch as you just need to set
cassandra.gossip_settle_wait_live_max . To restate the purpose of this patch is
to resolve Native-Transport-Request starting before Cassandra has finished ECHO
requests to other nodes. This results in requests failing LOCAL_QUORUM/QUORUM
consistency as the endpoints are not considered live for purposes of executing
requests.
This is coming up every time we are rolling restarting large clusters when
doing security patches and other such operations. So typically, only allow a
single node to be down at a time. With this Pull Request the waiting for live
endpoints ends once all endpoints are UP and so this allows for minimizing time
to perform rolling restarts while avoiding failed queries and affecting clients.
> Waiting for gossip to settle on live endpoints
> ----------------------------------------------
>
> Key: CASSANDRA-18845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18845
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Cameron Zemek
> Priority: Normal
> Attachments: 18845-seperate.patch, delay.log, example.log,
> image-2023-09-14-11-16-23-020.png, stream.log, test1.log, test2.log, test3.log
>
>
> This is a follow up to CASSANDRA-18543
> Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms
> this is tedious and error prone. On a node just observed a 79 second gap
> between waiting for gossip and the first echo response to indicate a node is
> UP.
> The problem being that do not want to start Native Transport until gossip
> settles otherwise queries can fail consistency such as LOCAL_QUORUM as it
> thinks the replicas are still in DOWN state.
> Instead of having to set gossip_settle_min_wait_ms I am proposing that
> (outside single node cluster) wait for UP message from another node before
> considering gossip as settled. Eg.
> {code:java}
> if (currentSize == epSize && currentLive == liveSize && liveSize
> > 1)
> {
> logger.debug("Gossip looks settled.");
> numOkay++;
> } {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]