[
https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767358#comment-17767358
]
Cameron Zemek edited comment on CASSANDRA-18845 at 9/21/23 2:59 AM:
--------------------------------------------------------------------
{noformat}
Sep 19 08:09:45 ip-10-1-57-23 cassandra[131402]: INFO
org.apache.cassandra.gms.Gossiper Waiting for gossip to settle...
Sep 19 08:10:56 ip-10-1-57-23 cassandra[131402]: DEBUG
org.apache.cassandra.gms.Gossiper Sending a EchoMessage to
/35.83.14.80{noformat}
I am struggling to reproduce this ^ I seen it twice, and after enabling more
logging haven't been able to reproduce again.
What I do sometimes see though is it taking over 30 seconds to get the first
ECHO response. Since there are dtests that rely on having CQL up while nodes
are down, I have attached a patch [^18845-seperate.patch] (against 5.0 branch)
that is opt-in. Having settle just check for currentLive == liveSize is still
allowing NTR to start while nodes are marked down. Yes you can increase
cassandra.gossip_settle_poll_success_required (and/or the other properties) to
mitigate it but these increase the minimum startup time. Whereas
[^18845-seperate.patch] doesn't add to this when the cluster is healthy.
A more elaborate solution would be to specify the required consistency level.
And for all token ranges owned by the node you check if you have the needed
live endpoints to satisfy the consistency level.
was (Author: cam1982):
{noformat}
Sep 19 08:09:45 ip-10-1-57-23 cassandra[131402]: INFO
org.apache.cassandra.gms.Gossiper Waiting for gossip to settle...
Sep 19 08:10:56 ip-10-1-57-23 cassandra[131402]: DEBUG
org.apache.cassandra.gms.Gossiper Sending a EchoMessage to
/35.83.14.80{noformat}
I am struggling to reproduce this ^ I seen it twice, and after enabling more
logging haven't been able to reproduce again.
What I do sometimes see though it taking over 30 seconds to get the first ECHO
response. Since there are dtests that rely on having CQL up while nodes are
down, I have attached a patch [^18845-seperate.patch] (against 5.0 branch) that
is opt-in. Having settle just check for currentLive == liveSize is still
allowing NTR to start while nodes are marked down. Yes you can increase
cassandra.gossip_settle_poll_success_required (and/or the other properties) to
mitigate it but these increase the minimum startup time. Whereas
[^18845-seperate.patch] doesn't add to this when the cluster is healthy.
A more elaborate solution would be to specify the required consistency level.
And for all token ranges owned by the node you check if you have the needed
live endpoints to satisfy the consistency level.
> Waiting for gossip to settle on live endpoints
> ----------------------------------------------
>
> Key: CASSANDRA-18845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18845
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Cameron Zemek
> Priority: Normal
> Attachments: 18845-seperate.patch, delay.log, example.log,
> image-2023-09-14-11-16-23-020.png, stream.log, test1.log, test2.log, test3.log
>
>
> This is a follow up to CASSANDRA-18543
> Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms
> this is tedious and error prone. On a node just observed a 79 second gap
> between waiting for gossip and the first echo response to indicate a node is
> UP.
> The problem being that do not want to start Native Transport until gossip
> settles otherwise queries can fail consistency such as LOCAL_QUORUM as it
> thinks the replicas are still in DOWN state.
> Instead of having to set gossip_settle_min_wait_ms I am proposing that
> (outside single node cluster) wait for UP message from another node before
> considering gossip as settled. Eg.
> {code:java}
> if (currentSize == epSize && currentLive == liveSize && liveSize
> > 1)
> {
> logger.debug("Gossip looks settled.");
> numOkay++;
> } {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]