[ 
https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764467#comment-17764467
 ] 

Cameron Zemek edited comment on CASSANDRA-18845 at 9/13/23 3:32 AM:
--------------------------------------------------------------------

I have attached patched. Tested this as follows:
 # Spin up single node cluster. Works due to epSize == liveSize check that lets 
it bypass the liveSize > 1 check
 # Spin up 3 node cluster. All 3 nodes start up NTR as expected.
 # Shutdown all nodes. Start up first node it stays waiting in gossip due to 
the liveSize > 1 requirement
 # Start up second node. Now both nodes start NTR since liveSize > 1 and there 
are no other incoming `is now UP` events so gossip looks settled.

NOTE: I had to disable the if condition for call to Gossiper.waitToSettle() 
since was using loopback addresses


was (Author: cam1982):
I have attached patched. Tested this as follows:
 # Spin up single node cluster. Works due to epSize == liveSize check that lets 
it bypass the liveSize > 1 check
 # Spin up 3 node cluster. All 3 nodes start up NTR as expected.
 # Shutdown all nodes. Start up first node it stays waiting in gossip due to 
the liveSize > 1 requirement
 # Start up second node. Now both nodes start NTR since liveSize > 1 and there 
are no other incoming `is now UP` events so gossip looks settled.

> Waiting for gossip to settle on live endpoints
> ----------------------------------------------
>
>                 Key: CASSANDRA-18845
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18845
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Cameron Zemek
>            Priority: Normal
>         Attachments: 18845-3.11.patch
>
>
> This is a follow up to CASSANDRA-18543
> Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms 
> this is tedious and error prone. On a node just observed a 79 second gap 
> between waiting for gossip and the first echo response to indicate a node is 
> UP.
> The problem being that do not want to start Native Transport until gossip 
> settles otherwise queries can fail consistency such as LOCAL_QUORUM as it 
> thinks the replicas are still in DOWN state.
> Instead of having to set gossip_settle_min_wait_ms I am proposing that 
> (outside single node cluster) wait for UP message from another node before 
> considering gossip as settled. Eg.
> {code:java}
>             if (currentSize == epSize && currentLive == liveSize && liveSize 
> > 1)
>             {
>                 logger.debug("Gossip looks settled.");
>                 numOkay++;
>             } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to