[
https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764934#comment-17764934
]
Cameron Zemek commented on CASSANDRA-18845:
-------------------------------------------
[~brandon.williams] [~smiklosovic] the existing conditions
{noformat}
currentSize == epSize && currentLive == liveSize{noformat}
are what stops it starting Native Transport too early if gossip is still being
updated (for example liveSize is changing).
waitToSettle waits by default 5 seconds then it starts polling every 1 second 3
times seeing if either liveSize or epSize changes and resets its numOkay if
either of these changes. The problem is when for example it took 79 seconds for
that first change in liveSize, liveSize was constantly at 1 so it goes okay
gossip is settled due to no changes in epSize or liveSize.
The extra condition therefore is don't consider gossip settled if there only 1
live endpoint (the node itself). Unless it's a single node cluster (epSize ==
liveSize)
> So when there is a cluster of 50 nodes, without this change, that "if" would
> return false (or it would not return true fast enough to increment numOkay to
> break from that while) as there would be new endpoints or live members
> detected each round.
To rephrase the problem is there is no new endpoints or live members changes.
waitToSettle will consider it settled with liveSize == 1 currently.
> why it takes almost minute and a half
This is a good question but in general it takes quite awhile for gossip to
complete on clusters with multiple datacenters and/or large number of nodes. I
think that is a different much more complex JIRA. The purpose of the attached
patch is so you don't need to guess what cassandra.gossip_settle_min_wait_ms to
use. It waits for at least one node to report is now UP in order to increment
numOkay and to continue with the rest of the waitToSettle logic.
!image-2023-09-14-11-16-23-020.png!
> Waiting for gossip to settle on live endpoints
> ----------------------------------------------
>
> Key: CASSANDRA-18845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18845
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Cameron Zemek
> Priority: Normal
> Attachments: 18845-3.11.patch, 18845-4.0.patch, 18845-4.1.patch,
> 18845-5.0.patch, image-2023-09-14-11-16-23-020.png
>
>
> This is a follow up to CASSANDRA-18543
> Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms
> this is tedious and error prone. On a node just observed a 79 second gap
> between waiting for gossip and the first echo response to indicate a node is
> UP.
> The problem being that do not want to start Native Transport until gossip
> settles otherwise queries can fail consistency such as LOCAL_QUORUM as it
> thinks the replicas are still in DOWN state.
> Instead of having to set gossip_settle_min_wait_ms I am proposing that
> (outside single node cluster) wait for UP message from another node before
> considering gossip as settled. Eg.
> {code:java}
> if (currentSize == epSize && currentLive == liveSize && liveSize
> > 1)
> {
> logger.debug("Gossip looks settled.");
> numOkay++;
> } {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]