[jira] [Commented] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints

2024-04-11 Thread Cameron Zemek (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836370#comment-17836370
 ] 

Cameron Zemek commented on CASSANDRA-18845:
---

I have reworked the patch more so it a new method instead of modifying the 
existing waitToSettle. So it has the least change to any existing behavior. It 
directly called in MigrationCoordinator::awaitSchemaRequests to handle if node 
bootstrapping (since need nodes in UP state in order to get schema and stream 
sstables from). And just before enabling native transport. 
https://issues.apache.org/jira/secure/attachment/13068153/CASSANDRA-18845-4_0_12.patch

> Waiting for gossip to settle on live endpoints
> --
>
> Key: CASSANDRA-18845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18845
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cameron Zemek
>Priority: Normal
> Attachments: 18845-seperate.patch, CASSANDRA-18845-4_0_12.patch, 
> delay.log, example.log, image-2023-09-14-11-16-23-020.png, stream.log, 
> test1.log, test2.log, test3.log
>
>
> This is a follow up to CASSANDRA-18543
> Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms 
> this is tedious and error prone. On a node just observed a 79 second gap 
> between waiting for gossip and the first echo response to indicate a node is 
> UP.
> The problem being that do not want to start Native Transport until gossip 
> settles otherwise queries can fail consistency such as LOCAL_QUORUM as it 
> thinks the replicas are still in DOWN state.
> Instead of having to set gossip_settle_min_wait_ms I am proposing that 
> (outside single node cluster) wait for UP message from another node before 
> considering gossip as settled. Eg.
> {code:java}
>             if (currentSize == epSize && currentLive == liveSize && liveSize 
> > 1)
>             {
>                 logger.debug("Gossip looks settled.");
>                 numOkay++;
>             } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints

2023-10-05 Thread Cameron Zemek (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17772415#comment-17772415
 ] 

Cameron Zemek commented on CASSANDRA-18845:
---

I have reworked the patch into pull request here: [Wait for live endpoints as 
part of waiting for gossip to settle by grom358 · Pull Request #2778 · 
apache/cassandra (github.com)|https://github.com/apache/cassandra/pull/2778]. 
Created the PR against 4.1 since 5.x is not as stable.

Still have not got around to making an automated test for this yet. It has the 
following behaviors:
 * Must opt-in by setting cassandra.gossip_settle_wait_live_max
 * Waits up to maximum number of polls defined by 
cassandra.gossip_settle_wait_live_max . Set to -1 to wait indefinitely.
 * cassandra.skip_wait_for_gossip_to_settle still applies to cap the maximum 
number of polls.
 * cassandra.gossip_settle_wait_live_required determines how many polls in a 
row without change to live endpoint state to consider gossip as settled once 
opt-in via cassandra.gossip_settle_wait_live_max
 * If live endpoint size equals number of endpoints, consider live endpoints as 
settled.
 * Requires at least 1 other live endpoint to begin considering live endpoints 
as settled.

Scenarios considered:
 * One node cluster. Will skip this check since epSize == liveSize
 * Entire cluster is down and starting up a node. Will wait 
cassandra.gossip_settle_wait_live_max polls
 * Restarting a node when another node is down. Will wait 
cassandra.gossip_settle_wait_live_required polls
 * On rare occasions it takes a while to see another node as UP. This is 
covered by requiring at least 1 other endpoint as up `liveSize > 1` to start 
the settlement process.

Being opt-in, this doesn't break any existing tests. This is also easier to use 
then the reverted patch as you just need to set 
cassandra.gossip_settle_wait_live_max . To restate the purpose of this patch is 
to resolve Native-Transport-Request starting before Cassandra has finished ECHO 
requests to other nodes. This results in requests failing LOCAL_QUORUM/QUORUM 
consistency as the endpoints are not considered live for purposes of executing 
requests.

This is coming up every time we are rolling restarting large clusters when 
doing security patches and other such operations. So typically, only allow a 
single node to be down at a time. With this Pull Request the waiting for live 
endpoints ends once all endpoints are UP and so this allows for minimizing time 
to perform rolling restarts while avoiding failed queries and affecting clients.

> Waiting for gossip to settle on live endpoints
> --
>
> Key: CASSANDRA-18845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18845
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cameron Zemek
>Priority: Normal
> Attachments: 18845-seperate.patch, delay.log, example.log, 
> image-2023-09-14-11-16-23-020.png, stream.log, test1.log, test2.log, test3.log
>
>
> This is a follow up to CASSANDRA-18543
> Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms 
> this is tedious and error prone. On a node just observed a 79 second gap 
> between waiting for gossip and the first echo response to indicate a node is 
> UP.
> The problem being that do not want to start Native Transport until gossip 
> settles otherwise queries can fail consistency such as LOCAL_QUORUM as it 
> thinks the replicas are still in DOWN state.
> Instead of having to set gossip_settle_min_wait_ms I am proposing that 
> (outside single node cluster) wait for UP message from another node before 
> considering gossip as settled. Eg.
> {code:java}
>             if (currentSize == epSize && currentLive == liveSize && liveSize 
> > 1)
>             {
>                 logger.debug("Gossip looks settled.");
>                 numOkay++;
>             } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints

2023-09-21 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17767418#comment-17767418
 ] 

Stefan Miklosovic commented on CASSANDRA-18845:
---

The "separate patch" makes sense to me. I like the fact that we are not 
changing what was there but we just add on top of that so the original logic is 
untouched. It would keep things as they were but you would be also covered if 
you have special requirements e.g you are waiting for all nodes to be marked as 
live so you have some level of certainty that CQL requests will not fail 
afterwards.

I am still lacking a comprehensive test e.g. as in-jvm dtest. I could probably 
help you with that but I can imagine already that nailing down this scenario 
precisely and consistently so it is repeatable might be a little bit 
challenging. 

> Waiting for gossip to settle on live endpoints
> --
>
> Key: CASSANDRA-18845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18845
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cameron Zemek
>Priority: Normal
> Attachments: 18845-seperate.patch, delay.log, example.log, 
> image-2023-09-14-11-16-23-020.png, stream.log, test1.log, test2.log, test3.log
>
>
> This is a follow up to CASSANDRA-18543
> Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms 
> this is tedious and error prone. On a node just observed a 79 second gap 
> between waiting for gossip and the first echo response to indicate a node is 
> UP.
> The problem being that do not want to start Native Transport until gossip 
> settles otherwise queries can fail consistency such as LOCAL_QUORUM as it 
> thinks the replicas are still in DOWN state.
> Instead of having to set gossip_settle_min_wait_ms I am proposing that 
> (outside single node cluster) wait for UP message from another node before 
> considering gossip as settled. Eg.
> {code:java}
>             if (currentSize == epSize && currentLive == liveSize && liveSize 
> > 1)
>             {
>                 logger.debug("Gossip looks settled.");
>                 numOkay++;
>             } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints

2023-09-20 Thread Cameron Zemek (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17767361#comment-17767361
 ] 

Cameron Zemek commented on CASSANDRA-18845:
---

{noformat}
Sep 21 03:01:42 ip-10-1-32-228 cassandra[52927]: INFO  
org.apache.cassandra.gms.Gossiper Waiting for gossip to settle...
Sep 21 03:01:48 ip-10-1-32-228 cassandra[52927]: INFO  
org.apache.cassandra.gms.Gossiper Gossip looks settled. epSize=108
Sep 21 03:01:49 ip-10-1-32-228 cassandra[52927]: INFO  
org.apache.cassandra.gms.Gossiper Gossip looks settled. epSize=108
Sep 21 03:01:50 ip-10-1-32-228 cassandra[52927]: INFO  
org.apache.cassandra.gms.Gossiper Gossip looks settled. epSize=108
Sep 21 03:02:00 ip-10-1-32-228 cassandra[52927]: INFO  
o.a.c.gms.GossipDigestAckVerbHandler Received a GossipDigestAckMessage from 
/15.223.140.86
Sep 21 03:02:00 ip-10-1-32-228 cassandra[52927]: INFO  
org.apache.cassandra.gms.Gossiper Sending a EchoMessage to /44.229.153.229
...
Sep 21 03:03:40 ip-10-1-32-228 cassandra[52927]: INFO  
org.apache.cassandra.gms.Gossiper InetAddress /44.229.153.229 is now 
UP{noformat}
Got a test run with 18 second delay. 

> Waiting for gossip to settle on live endpoints
> --
>
> Key: CASSANDRA-18845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18845
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cameron Zemek
>Priority: Normal
> Attachments: 18845-seperate.patch, delay.log, example.log, 
> image-2023-09-14-11-16-23-020.png, stream.log, test1.log, test2.log, test3.log
>
>
> This is a follow up to CASSANDRA-18543
> Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms 
> this is tedious and error prone. On a node just observed a 79 second gap 
> between waiting for gossip and the first echo response to indicate a node is 
> UP.
> The problem being that do not want to start Native Transport until gossip 
> settles otherwise queries can fail consistency such as LOCAL_QUORUM as it 
> thinks the replicas are still in DOWN state.
> Instead of having to set gossip_settle_min_wait_ms I am proposing that 
> (outside single node cluster) wait for UP message from another node before 
> considering gossip as settled. Eg.
> {code:java}
>             if (currentSize == epSize && currentLive == liveSize && liveSize 
> > 1)
>             {
>                 logger.debug("Gossip looks settled.");
>                 numOkay++;
>             } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints

2023-09-20 Thread Cameron Zemek (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17767358#comment-17767358
 ] 

Cameron Zemek commented on CASSANDRA-18845:
---

 
{noformat}
Sep 19 08:09:45 ip-10-1-57-23 cassandra[131402]: INFO  
org.apache.cassandra.gms.Gossiper Waiting for gossip to settle...
Sep 19 08:10:56 ip-10-1-57-23 cassandra[131402]: DEBUG 
org.apache.cassandra.gms.Gossiper Sending a EchoMessage to 
/35.83.14.80{noformat}
I am struggling to reproduce this ^ I seen it twice, and after enabling more 
logging haven't been able to reproduce again.

 

What I do sometimes see though it taking over 30 seconds to get the first ECHO 
response. Since there are dtests that rely on having CQL up while nodes are 
down, I have attached a patch [^18845-seperate.patch] (against 5.0 branch) that 
is opt-in. Having settle just check for currentLive == liveSize is still 
allowing NTR to start while nodes are marked down. Yes you can increase 
cassandra.gossip_settle_poll_success_required (and/or the other properties) to 
mitigate it but these increase the minimum startup time. Whereas 
[^18845-seperate.patch] doesn't add to this when the cluster is healthy.

 

A more elaborate solution would be to specify the required consistency level. 
And for all token ranges owned by the node you check if you have the needed 
live endpoints to satisfy the consistency level.

> Waiting for gossip to settle on live endpoints
> --
>
> Key: CASSANDRA-18845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18845
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cameron Zemek
>Priority: Normal
> Attachments: 18845-seperate.patch, delay.log, example.log, 
> image-2023-09-14-11-16-23-020.png, stream.log, test1.log, test2.log, test3.log
>
>
> This is a follow up to CASSANDRA-18543
> Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms 
> this is tedious and error prone. On a node just observed a 79 second gap 
> between waiting for gossip and the first echo response to indicate a node is 
> UP.
> The problem being that do not want to start Native Transport until gossip 
> settles otherwise queries can fail consistency such as LOCAL_QUORUM as it 
> thinks the replicas are still in DOWN state.
> Instead of having to set gossip_settle_min_wait_ms I am proposing that 
> (outside single node cluster) wait for UP message from another node before 
> considering gossip as settled. Eg.
> {code:java}
>             if (currentSize == epSize && currentLive == liveSize && liveSize 
> > 1)
>             {
>                 logger.debug("Gossip looks settled.");
>                 numOkay++;
>             } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints

2023-09-20 Thread Cameron Zemek (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17767052#comment-17767052
 ] 

Cameron Zemek commented on CASSANDRA-18845:
---

with this removed
{code:java}
(epSize == liveSize || liveSize > 1){code}
the j11_dtests just passed. [j11_dtests (120384) - instaclustr/cassandra 
(circleci.com)|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3180/workflows/2f7e6199-d865-4eee-a3b1-9511a4c88a45/jobs/120384]

> Waiting for gossip to settle on live endpoints
> --
>
> Key: CASSANDRA-18845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18845
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cameron Zemek
>Priority: Normal
> Attachments: delay.log, example.log, 
> image-2023-09-14-11-16-23-020.png, stream.log, test1.log, test2.log, test3.log
>
>
> This is a follow up to CASSANDRA-18543
> Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms 
> this is tedious and error prone. On a node just observed a 79 second gap 
> between waiting for gossip and the first echo response to indicate a node is 
> UP.
> The problem being that do not want to start Native Transport until gossip 
> settles otherwise queries can fail consistency such as LOCAL_QUORUM as it 
> thinks the replicas are still in DOWN state.
> Instead of having to set gossip_settle_min_wait_ms I am proposing that 
> (outside single node cluster) wait for UP message from another node before 
> considering gossip as settled. Eg.
> {code:java}
>             if (currentSize == epSize && currentLive == liveSize && liveSize 
> > 1)
>             {
>                 logger.debug("Gossip looks settled.");
>                 numOkay++;
>             } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints

2023-09-20 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17767024#comment-17767024
 ] 

Stefan Miklosovic commented on CASSANDRA-18845:
---

I am retracting my note about loopback addresses. It does call waitToSettle 
because (empirically tested)

{code}
if 
(!FBUtilities.getBroadcastAddressAndPort().equals(InetAddressAndPort.getLoopbackAddress()))
Gossiper.waitToSettle();
{code}

evaluates to 
"FBUtilities.getBroadcastAddressAndPort().equals(InetAddressAndPort.getLoopbackAddress())"
 being false. Which is true as broadcast is 127.0.0.2 and loopback is 
127.0.0.1. So it does call waitToSettle.

> Waiting for gossip to settle on live endpoints
> --
>
> Key: CASSANDRA-18845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18845
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cameron Zemek
>Priority: Normal
> Attachments: delay.log, example.log, 
> image-2023-09-14-11-16-23-020.png, stream.log, test1.log, test2.log, test3.log
>
>
> This is a follow up to CASSANDRA-18543
> Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms 
> this is tedious and error prone. On a node just observed a 79 second gap 
> between waiting for gossip and the first echo response to indicate a node is 
> UP.
> The problem being that do not want to start Native Transport until gossip 
> settles otherwise queries can fail consistency such as LOCAL_QUORUM as it 
> thinks the replicas are still in DOWN state.
> Instead of having to set gossip_settle_min_wait_ms I am proposing that 
> (outside single node cluster) wait for UP message from another node before 
> considering gossip as settled. Eg.
> {code:java}
>             if (currentSize == epSize && currentLive == liveSize && liveSize 
> > 1)
>             {
>                 logger.debug("Gossip looks settled.");
>                 numOkay++;
>             } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints

2023-09-20 Thread Cameron Zemek (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17767007#comment-17767007
 ] 

Cameron Zemek commented on CASSANDRA-18845:
---

[^stream.log] Without this patch I get nodes stuck in being unable to join 
large test cluster:
{noformat}
Sep 20 01:18:51 ip-10-7-20-120 cassandra[5521]: INFO  
o.a.cassandra.service.StorageService JOINING: Starting to bootstrap...
Sep 20 01:18:51 ip-10-7-20-120 cassandra[5521]: Exception 
(java.lang.RuntimeException) encountered during startup: A node required to 
move the data consistently is down (/13.237.60.255). If you wish to move the 
data from a potentially inconsistent replica, restart the node with 
-Dcassandra.consistent.rangemovement=false
Sep 20 01:18:51 ip-10-7-20-120 cassandra[5521]: java.lang.RuntimeException: A 
node required to move the data consistently is down (/13.237.60.255). If you 
wish to move the data from a potentially inconsistent replica, restart the node 
with -Dcassandra.consistent.rangemovement=false
Sep 20 01:18:51 ip-10-7-20-120 cassandra[5521]:         at 
org.apache.cassandra.dht.RangeStreamer.getAllRangesWithStrictSourcesFor(RangeStreamer.java:294){noformat}
The node is in endless restart cycle (since our service keeps retrying) with it 
reporting a different IP each time. 

> Waiting for gossip to settle on live endpoints
> --
>
> Key: CASSANDRA-18845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18845
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cameron Zemek
>Priority: Normal
> Attachments: delay.log, example.log, 
> image-2023-09-14-11-16-23-020.png, stream.log, test1.log, test2.log, test3.log
>
>
> This is a follow up to CASSANDRA-18543
> Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms 
> this is tedious and error prone. On a node just observed a 79 second gap 
> between waiting for gossip and the first echo response to indicate a node is 
> UP.
> The problem being that do not want to start Native Transport until gossip 
> settles otherwise queries can fail consistency such as LOCAL_QUORUM as it 
> thinks the replicas are still in DOWN state.
> Instead of having to set gossip_settle_min_wait_ms I am proposing that 
> (outside single node cluster) wait for UP message from another node before 
> considering gossip as settled. Eg.
> {code:java}
>             if (currentSize == epSize && currentLive == liveSize && liveSize 
> > 1)
>             {
>                 logger.debug("Gossip looks settled.");
>                 numOkay++;
>             } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints

2023-09-20 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17767125#comment-17767125
 ] 

Brandon Williams commented on CASSANDRA-18845:
--

I don't have time to look at this fully but one thing you may want to do is 
something I did on CASSANDRA-18792 to find the issue, which is add more 
debugging around the echoes and push it up to debug so I didn't have to cloud 
everything with TRACE. 
https://github.com/driftx/cassandra/commit/e1e6b1a0fb0dacc067ddc5910659e1fe6da2cd52

> Waiting for gossip to settle on live endpoints
> --
>
> Key: CASSANDRA-18845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18845
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cameron Zemek
>Priority: Normal
> Attachments: delay.log, example.log, 
> image-2023-09-14-11-16-23-020.png, stream.log, test1.log, test2.log, test3.log
>
>
> This is a follow up to CASSANDRA-18543
> Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms 
> this is tedious and error prone. On a node just observed a 79 second gap 
> between waiting for gossip and the first echo response to indicate a node is 
> UP.
> The problem being that do not want to start Native Transport until gossip 
> settles otherwise queries can fail consistency such as LOCAL_QUORUM as it 
> thinks the replicas are still in DOWN state.
> Instead of having to set gossip_settle_min_wait_ms I am proposing that 
> (outside single node cluster) wait for UP message from another node before 
> considering gossip as settled. Eg.
> {code:java}
>             if (currentSize == epSize && currentLive == liveSize && liveSize 
> > 1)
>             {
>                 logger.debug("Gossip looks settled.");
>                 numOkay++;
>             } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints

2023-09-20 Thread Cameron Zemek (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17767037#comment-17767037
 ] 

Cameron Zemek commented on CASSANDRA-18845:
---

the 
{noformat}
(epSize == liveSize || liveSize > 1){noformat}
part breaks dtests. For example, 
{noformat}
pytest --force-resource-intensive-tests 
--cassandra-dir=/home/grom/dev/cassandra 
materialized_views_test.py::TestMaterializedViews::test_throttled_partition_update{noformat}
This test fails since it will shutdown a 5 node cluster and start/stop each 
node one at a time. And therefore liveSize > 1 is never true.

Possible paths forward:
 # The check for waiting for other nodes is off by default and requries setting 
a system property.
 # Figure out why there this large delay between waitToSettle call and getting 
ECHO responses.
 # Have the tests override cassandra.skip_wait_for_gossip_to_settle
 # ?? Some other option haven't thought of yet.

> Waiting for gossip to settle on live endpoints
> --
>
> Key: CASSANDRA-18845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18845
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cameron Zemek
>Priority: Normal
> Attachments: delay.log, example.log, 
> image-2023-09-14-11-16-23-020.png, stream.log, test1.log, test2.log, test3.log
>
>
> This is a follow up to CASSANDRA-18543
> Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms 
> this is tedious and error prone. On a node just observed a 79 second gap 
> between waiting for gossip and the first echo response to indicate a node is 
> UP.
> The problem being that do not want to start Native Transport until gossip 
> settles otherwise queries can fail consistency such as LOCAL_QUORUM as it 
> thinks the replicas are still in DOWN state.
> Instead of having to set gossip_settle_min_wait_ms I am proposing that 
> (outside single node cluster) wait for UP message from another node before 
> considering gossip as settled. Eg.
> {code:java}
>             if (currentSize == epSize && currentLive == liveSize && liveSize 
> > 1)
>             {
>                 logger.debug("Gossip looks settled.");
>                 numOkay++;
>             } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints

2023-09-19 Thread Cameron Zemek (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766949#comment-17766949
 ] 

Cameron Zemek commented on CASSANDRA-18845:
---

Still running, but sharing the results so far:
{noformat}
$ pytest --count=500 --cassandra-dir=/home/grom/dev/cassandra 
transient_replication_ring_test.py::TestTransientReplicationRing::test_move_forwards_between_and_cleanup
/home/grom/dtest/lib/python3.10/site-packages/ccmlib/common.py:773: 
DeprecationWarning: distutils Version classes are deprecated. Use 
packaging.version instead.
  return LooseVersion(match.group(1))
== test session starts 
===platform linux -- Python 3.10.12, 
pytest-7.3.1, pluggy-1.0.0
rootdir: /home/grom/tmp/cassandra-dtest
configfile: pytest.ini
plugins: repeat-0.9.1, flaky-3.7.0, timeout-1.4.2
timeout: 900.0s
timeout method: signal
timeout func_only: False
collected 500 itemstransient_replication_ring_test.py 
... [ 11%]
{noformat}

> Waiting for gossip to settle on live endpoints
> --
>
> Key: CASSANDRA-18845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18845
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cameron Zemek
>Priority: Normal
> Attachments: delay.log, example.log, 
> image-2023-09-14-11-16-23-020.png, test1.log, test2.log, test3.log
>
>
> This is a follow up to CASSANDRA-18543
> Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms 
> this is tedious and error prone. On a node just observed a 79 second gap 
> between waiting for gossip and the first echo response to indicate a node is 
> UP.
> The problem being that do not want to start Native Transport until gossip 
> settles otherwise queries can fail consistency such as LOCAL_QUORUM as it 
> thinks the replicas are still in DOWN state.
> Instead of having to set gossip_settle_min_wait_ms I am proposing that 
> (outside single node cluster) wait for UP message from another node before 
> considering gossip as settled. Eg.
> {code:java}
>             if (currentSize == epSize && currentLive == liveSize && liveSize 
> > 1)
>             {
>                 logger.debug("Gossip looks settled.");
>                 numOkay++;
>             } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints

2023-09-19 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766917#comment-17766917
 ] 

Stefan Miklosovic commented on CASSANDRA-18845:
---

I noticed now this in your comment above:

_This is going to be very difficult todo. dtests setup clusters on loopback 
addresses and waitToSettle code path has a guard against it if using a loopback 
address. Also, the problems mostly become apparent with large clusters._

This is really true (1) Gossip.waitToSettle is called only in case it is not on 
loopback. Since our dtests are all on loopback (right?) I do not think that 
code was ever invoked during dtests so its revert was not necessary. 

_If I redo the patch and remove the changes to ECHO and show those tests do not 
have regression would this allow the ticket to move forward?_

I think that is reasonable, wdyt, [~brandon.williams]?

I think that what was unfortunate was that we mixed flood of echos solution / 
change with waiting for at least one node to be up. I think that the waiting 
for at least 1 node can go in and we will focus on the echos separately. 

(1) 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/CassandraDaemon.java#L400-L401
(2) 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/dht/BootStrapper.java#L213-L214
(3) 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/dht/BootStrapper.java#L235-L236
(4) 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/schema/MigrationCoordinator.java#L696-L697



> Waiting for gossip to settle on live endpoints
> --
>
> Key: CASSANDRA-18845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18845
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cameron Zemek
>Priority: Normal
> Attachments: delay.log, example.log, 
> image-2023-09-14-11-16-23-020.png, test1.log, test2.log, test3.log
>
>
> This is a follow up to CASSANDRA-18543
> Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms 
> this is tedious and error prone. On a node just observed a 79 second gap 
> between waiting for gossip and the first echo response to indicate a node is 
> UP.
> The problem being that do not want to start Native Transport until gossip 
> settles otherwise queries can fail consistency such as LOCAL_QUORUM as it 
> thinks the replicas are still in DOWN state.
> Instead of having to set gossip_settle_min_wait_ms I am proposing that 
> (outside single node cluster) wait for UP message from another node before 
> considering gossip as settled. Eg.
> {code:java}
>             if (currentSize == epSize && currentLive == liveSize && liveSize 
> > 1)
>             {
>                 logger.debug("Gossip looks settled.");
>                 numOkay++;
>             } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints

2023-09-19 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766850#comment-17766850
 ] 

Stefan Miklosovic commented on CASSANDRA-18845:
---

[~cam1982] you can simulate lost echo even in a setup with 2 nodes. This is 
possible with in-jvm dtests, definitely. You can drop whole communication 
between nodes like this (1)

(1) 
https://github.com/apache/cassandra/blob/trunk/test/distributed/org/apache/cassandra/distributed/test/AuthTest.java#L99-L101

> Waiting for gossip to settle on live endpoints
> --
>
> Key: CASSANDRA-18845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18845
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cameron Zemek
>Priority: Normal
> Attachments: delay.log, example.log, 
> image-2023-09-14-11-16-23-020.png, test1.log, test2.log, test3.log
>
>
> This is a follow up to CASSANDRA-18543
> Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms 
> this is tedious and error prone. On a node just observed a 79 second gap 
> between waiting for gossip and the first echo response to indicate a node is 
> UP.
> The problem being that do not want to start Native Transport until gossip 
> settles otherwise queries can fail consistency such as LOCAL_QUORUM as it 
> thinks the replicas are still in DOWN state.
> Instead of having to set gossip_settle_min_wait_ms I am proposing that 
> (outside single node cluster) wait for UP message from another node before 
> considering gossip as settled. Eg.
> {code:java}
>             if (currentSize == epSize && currentLive == liveSize && liveSize 
> > 1)
>             {
>                 logger.debug("Gossip looks settled.");
>                 numOkay++;
>             } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints

2023-09-19 Thread Cameron Zemek (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766694#comment-17766694
 ] 

Cameron Zemek commented on CASSANDRA-18845:
---

[^delay.log]

Attached a log from 105 node test cluster that shows the delay between starting 
to wait for gossip and getting replies back for UP .

Snippet
{noformat}
Sep 19 08:09:45 ip-10-1-57-23 cassandra[131402]: INFO  
org.apache.cassandra.gms.Gossiper Waiting for gossip to settle...
Sep 19 08:10:56 ip-10-1-57-23 cassandra[131402]: DEBUG 
org.apache.cassandra.gms.Gossiper Sending a EchoMessage to /35.83.14.80
Sep 19 08:10:57 ip-10-1-57-23 cassandra[131402]: INFO  
org.apache.cassandra.gms.Gossiper InetAddress /54.149.62.104 is now UP{noformat}
So the delay is in sending out the Echo.

> Waiting for gossip to settle on live endpoints
> --
>
> Key: CASSANDRA-18845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18845
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cameron Zemek
>Priority: Normal
> Attachments: delay.log, example.log, 
> image-2023-09-14-11-16-23-020.png, test1.log, test2.log, test3.log
>
>
> This is a follow up to CASSANDRA-18543
> Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms 
> this is tedious and error prone. On a node just observed a 79 second gap 
> between waiting for gossip and the first echo response to indicate a node is 
> UP.
> The problem being that do not want to start Native Transport until gossip 
> settles otherwise queries can fail consistency such as LOCAL_QUORUM as it 
> thinks the replicas are still in DOWN state.
> Instead of having to set gossip_settle_min_wait_ms I am proposing that 
> (outside single node cluster) wait for UP message from another node before 
> considering gossip as settled. Eg.
> {code:java}
>             if (currentSize == epSize && currentLive == liveSize && liveSize 
> > 1)
>             {
>                 logger.debug("Gossip looks settled.");
>                 numOkay++;
>             } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints

2023-09-19 Thread Cameron Zemek (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766677#comment-17766677
 ] 

Cameron Zemek commented on CASSANDRA-18845:
---

!test1.log|width=7,height=7,align=absmiddle!

!test2.log|width=7,height=7,align=absmiddle!

!test3.log|width=7,height=7,align=absmiddle!

Tested the patch 3 times to confirm it working.

> Waiting for gossip to settle on live endpoints
> --
>
> Key: CASSANDRA-18845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18845
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cameron Zemek
>Priority: Normal
> Attachments: example.log, image-2023-09-14-11-16-23-020.png, 
> test1.log, test2.log, test3.log
>
>
> This is a follow up to CASSANDRA-18543
> Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms 
> this is tedious and error prone. On a node just observed a 79 second gap 
> between waiting for gossip and the first echo response to indicate a node is 
> UP.
> The problem being that do not want to start Native Transport until gossip 
> settles otherwise queries can fail consistency such as LOCAL_QUORUM as it 
> thinks the replicas are still in DOWN state.
> Instead of having to set gossip_settle_min_wait_ms I am proposing that 
> (outside single node cluster) wait for UP message from another node before 
> considering gossip as settled. Eg.
> {code:java}
>             if (currentSize == epSize && currentLive == liveSize && liveSize 
> > 1)
>             {
>                 logger.debug("Gossip looks settled.");
>                 numOkay++;
>             } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints

2023-09-18 Thread Cameron Zemek (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766628#comment-17766628
 ] 

Cameron Zemek commented on CASSANDRA-18845:
---

[Cassandra 18845 3.11 by grom358 · Pull Request #2701 · apache/cassandra 
(github.com)|https://github.com/apache/cassandra/pull/2701]

[Cassandra 18845 4.0 by grom358 · Pull Request #2702 · apache/cassandra 
(github.com)|https://github.com/apache/cassandra/pull/2702]

[Cassandra 18845 4.1 by grom358 · Pull Request #2703 · apache/cassandra 
(github.com)|https://github.com/apache/cassandra/pull/2703]

[Cassandra 18845 5.0 by grom358 · Pull Request #2704 · apache/cassandra 
(github.com)|https://github.com/apache/cassandra/pull/2704]

> Waiting for gossip to settle on live endpoints
> --
>
> Key: CASSANDRA-18845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18845
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cameron Zemek
>Priority: Normal
> Attachments: example.log, image-2023-09-14-11-16-23-020.png
>
>
> This is a follow up to CASSANDRA-18543
> Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms 
> this is tedious and error prone. On a node just observed a 79 second gap 
> between waiting for gossip and the first echo response to indicate a node is 
> UP.
> The problem being that do not want to start Native Transport until gossip 
> settles otherwise queries can fail consistency such as LOCAL_QUORUM as it 
> thinks the replicas are still in DOWN state.
> Instead of having to set gossip_settle_min_wait_ms I am proposing that 
> (outside single node cluster) wait for UP message from another node before 
> considering gossip as settled. Eg.
> {code:java}
>             if (currentSize == epSize && currentLive == liveSize && liveSize 
> > 1)
>             {
>                 logger.debug("Gossip looks settled.");
>                 numOkay++;
>             } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints

2023-09-17 Thread Cameron Zemek (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766192#comment-17766192
 ] 

Cameron Zemek commented on CASSANDRA-18845:
---

CASSANDRA-18543 had 3 components:
 # Allow for overriding the values used in waitToSettle
 # Make waitToSettle also consider the liveEndpoint members as part of settling.
 # Changes to handling of ECHO requests to remove duplicate inflight ECHO and 
duplicate log messages about the same node going into UP state 'is now UP'

 

With the reverting in CASSANDRA-18854 did the changes to waitToSettle need to 
be reverted? The problem seems to be the changes to ECHO. 

 

> The next step for this ticket to move forward will be to create tests that 
> demonstrate the problem and guard against regressions.

This is going to be very difficult todo. dtests setup clusters on loopback 
addresses and waitToSettle code path has a guard against it if using a loopback 
address. Also, the problems mostly become apparent with large clusters.

If redo the patch and remove the changes to ECHO and show those tests do not 
have regression would this allow the ticket to move forward?

I also in process of setting up a large test cluster. 

[^example.log] shows an example of what happens without the patched 
waitToSettle. Gossip settles before nodes have finished marked as UP.

> Waiting for gossip to settle on live endpoints
> --
>
> Key: CASSANDRA-18845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18845
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cameron Zemek
>Priority: Normal
> Attachments: 18845-3.11.patch, 18845-4.0.patch, 18845-4.1.patch, 
> 18845-5.0.patch, example.log, image-2023-09-14-11-16-23-020.png
>
>
> This is a follow up to CASSANDRA-18543
> Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms 
> this is tedious and error prone. On a node just observed a 79 second gap 
> between waiting for gossip and the first echo response to indicate a node is 
> UP.
> The problem being that do not want to start Native Transport until gossip 
> settles otherwise queries can fail consistency such as LOCAL_QUORUM as it 
> thinks the replicas are still in DOWN state.
> Instead of having to set gossip_settle_min_wait_ms I am proposing that 
> (outside single node cluster) wait for UP message from another node before 
> considering gossip as settled. Eg.
> {code:java}
>             if (currentSize == epSize && currentLive == liveSize && liveSize 
> > 1)
>             {
>                 logger.debug("Gossip looks settled.");
>                 numOkay++;
>             } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints

2023-09-15 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17765577#comment-17765577
 ] 

Stefan Miklosovic commented on CASSANDRA-18845:
---

That is unfortunate. We should probably focus more on finding out what is 
causing these long initial delays and how to remediate that rather than 
applying various band-aids on this (even though done in a good faith) 

> Waiting for gossip to settle on live endpoints
> --
>
> Key: CASSANDRA-18845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18845
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cameron Zemek
>Priority: Normal
> Attachments: 18845-3.11.patch, 18845-4.0.patch, 18845-4.1.patch, 
> 18845-5.0.patch, image-2023-09-14-11-16-23-020.png
>
>
> This is a follow up to CASSANDRA-18543
> Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms 
> this is tedious and error prone. On a node just observed a 79 second gap 
> between waiting for gossip and the first echo response to indicate a node is 
> UP.
> The problem being that do not want to start Native Transport until gossip 
> settles otherwise queries can fail consistency such as LOCAL_QUORUM as it 
> thinks the replicas are still in DOWN state.
> Instead of having to set gossip_settle_min_wait_ms I am proposing that 
> (outside single node cluster) wait for UP message from another node before 
> considering gossip as settled. Eg.
> {code:java}
>             if (currentSize == epSize && currentLive == liveSize && liveSize 
> > 1)
>             {
>                 logger.debug("Gossip looks settled.");
>                 numOkay++;
>             } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints

2023-09-15 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17765558#comment-17765558
 ] 

Brandon Williams commented on CASSANDRA-18845:
--

CASSANDRA-18543 is going to be reverted for causing a regression.  The next 
step for this ticket to move forward will be to create tests that demonstrate 
the problem and guard against regressions.

> Waiting for gossip to settle on live endpoints
> --
>
> Key: CASSANDRA-18845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18845
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cameron Zemek
>Priority: Normal
> Attachments: 18845-3.11.patch, 18845-4.0.patch, 18845-4.1.patch, 
> 18845-5.0.patch, image-2023-09-14-11-16-23-020.png
>
>
> This is a follow up to CASSANDRA-18543
> Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms 
> this is tedious and error prone. On a node just observed a 79 second gap 
> between waiting for gossip and the first echo response to indicate a node is 
> UP.
> The problem being that do not want to start Native Transport until gossip 
> settles otherwise queries can fail consistency such as LOCAL_QUORUM as it 
> thinks the replicas are still in DOWN state.
> Instead of having to set gossip_settle_min_wait_ms I am proposing that 
> (outside single node cluster) wait for UP message from another node before 
> considering gossip as settled. Eg.
> {code:java}
>             if (currentSize == epSize && currentLive == liveSize && liveSize 
> > 1)
>             {
>                 logger.debug("Gossip looks settled.");
>                 numOkay++;
>             } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints

2023-09-14 Thread Cameron Zemek (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17765429#comment-17765429
 ] 

Cameron Zemek commented on CASSANDRA-18845:
---

Need to-do more investigating around the slowness. I suspect its due to the 
flood of gossip messages on startup. The previous patch CASSANDRA-18543 removed 
the duplicate ECHO messages to cut down on this.

The behavior I notice happening in production though is there a large initial 
delay (> 10 seconds) for any nodes to be marked as `is now UP` then it floods 
in. On large clusters this takes over a minute to complete receiving them all. 
Prior to  CASSANDRA-18543 it never checked liveSize at all and so would start 
up regardless of UP status of nodes. With that change assuming the polling 
starts as UP status are received it waits. So the problem now is waiting for 
that initial event.

The previous patch from CASSANDRA-18543 allowed for overriding the gossip 
parameters but in hindsight it's difficult to determine a suitable default for 
that initial wait as its not consistent. The algorithm in waitToSettle relies 
on seeing a change in these values, so that initial delay if greater than the 
wait time plus the polling phase will move on and start NTR even though we have 
yet to see any nodes as UP.

You are correct that even with this proposed patch it's possible to still start 
NTR too early. Eg, if one node reports UP but the delay for the next event is 
longer than the polling period, but I am not seeing that in production so far. 
Therefore, the purpose of this patch is to have it wait for the first `is now 
UP` from a node instead of relying on cassandra.gossip_settle_min_wait_ms

> Waiting for gossip to settle on live endpoints
> --
>
> Key: CASSANDRA-18845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18845
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cameron Zemek
>Priority: Normal
> Attachments: 18845-3.11.patch, 18845-4.0.patch, 18845-4.1.patch, 
> 18845-5.0.patch, image-2023-09-14-11-16-23-020.png
>
>
> This is a follow up to CASSANDRA-18543
> Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms 
> this is tedious and error prone. On a node just observed a 79 second gap 
> between waiting for gossip and the first echo response to indicate a node is 
> UP.
> The problem being that do not want to start Native Transport until gossip 
> settles otherwise queries can fail consistency such as LOCAL_QUORUM as it 
> thinks the replicas are still in DOWN state.
> Instead of having to set gossip_settle_min_wait_ms I am proposing that 
> (outside single node cluster) wait for UP message from another node before 
> considering gossip as settled. Eg.
> {code:java}
>             if (currentSize == epSize && currentLive == liveSize && liveSize 
> > 1)
>             {
>                 logger.debug("Gossip looks settled.");
>                 numOkay++;
>             } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints

2023-09-14 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17765035#comment-17765035
 ] 

Stefan Miklosovic commented on CASSANDRA-18845:
---

Interesting. I am curious what causes that initial delay. What you are saying 
is that it takes a lot of time for the nodes to be up and then it appears (from 
the log you posted)  like all of them are reported more or less at the same 
time? There is an initial delay of dozes of seconds before it starts to get 
reported? If that is true then it probably makes sense to have a condition like 
that so we see at least some other nodes to be up to count it and increase 
numOkay.

However, if we have this 
{code:java}
if (currentSize == epSize && currentLive == liveSize && (epSize == liveSize || 
liveSize > 1))
{code}

Then what if we have 

{code}
currentSize = 2 , epSize = 2, currentLive = 2, liveSize = 2
{code}

That "if" would return true, so numOkay would be increased and it would count 
it as a valid round.

However, and it is a little bit hard to formulate it correctly, but is not it 
true that we are not guaranteeing that QUORUM would be satisfied here anyway? 
Because it could stay on all "twos" for all rounds and we would say that gossip 
settled while there is bunch of other nodes to be reported but they just have 
not make it and we were stuck on 2 for three rounds.



> Waiting for gossip to settle on live endpoints
> --
>
> Key: CASSANDRA-18845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18845
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cameron Zemek
>Priority: Normal
> Attachments: 18845-3.11.patch, 18845-4.0.patch, 18845-4.1.patch, 
> 18845-5.0.patch, image-2023-09-14-11-16-23-020.png
>
>
> This is a follow up to CASSANDRA-18543
> Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms 
> this is tedious and error prone. On a node just observed a 79 second gap 
> between waiting for gossip and the first echo response to indicate a node is 
> UP.
> The problem being that do not want to start Native Transport until gossip 
> settles otherwise queries can fail consistency such as LOCAL_QUORUM as it 
> thinks the replicas are still in DOWN state.
> Instead of having to set gossip_settle_min_wait_ms I am proposing that 
> (outside single node cluster) wait for UP message from another node before 
> considering gossip as settled. Eg.
> {code:java}
>             if (currentSize == epSize && currentLive == liveSize && liveSize 
> > 1)
>             {
>                 logger.debug("Gossip looks settled.");
>                 numOkay++;
>             } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints

2023-09-13 Thread Cameron Zemek (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764934#comment-17764934
 ] 

Cameron Zemek commented on CASSANDRA-18845:
---

[~brandon.williams] [~smiklosovic]  the existing conditions 
{noformat}
currentSize == epSize && currentLive == liveSize{noformat}
are what stops it starting Native Transport too early if gossip is still being 
updated (for example liveSize is changing). 

waitToSettle waits by default 5 seconds then it starts polling every 1 second 3 
times seeing if either liveSize or epSize changes and resets its numOkay if 
either of these changes. The problem is when for example it took 79 seconds for 
that first change in liveSize, liveSize was constantly at 1 so it goes okay 
gossip is settled due to no changes in epSize or liveSize.

The extra condition therefore is don't consider gossip settled if there only 1 
live endpoint (the node itself). Unless it's a single node cluster (epSize == 
liveSize)

 

> So when there is a cluster of 50 nodes, without this change, that "if" would 
> return false (or it would not return true fast enough to increment numOkay to 
> break from that while) as there would be new endpoints or live members 
> detected each round.

To rephrase the problem is there is no new endpoints or live members changes. 
waitToSettle will consider it settled with liveSize == 1 currently. 

 

> why it takes almost minute and a half

This is a good question but in general it takes quite awhile for gossip to 
complete on clusters with multiple datacenters and/or large number of nodes. I 
think that is a different much more complex JIRA. The purpose of the attached 
patch is so you don't need to guess what cassandra.gossip_settle_min_wait_ms to 
use. It waits for at least one node to report is now UP in order to increment 
numOkay and to continue with the rest of the waitToSettle logic.

 

!image-2023-09-14-11-16-23-020.png!

> Waiting for gossip to settle on live endpoints
> --
>
> Key: CASSANDRA-18845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18845
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cameron Zemek
>Priority: Normal
> Attachments: 18845-3.11.patch, 18845-4.0.patch, 18845-4.1.patch, 
> 18845-5.0.patch, image-2023-09-14-11-16-23-020.png
>
>
> This is a follow up to CASSANDRA-18543
> Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms 
> this is tedious and error prone. On a node just observed a 79 second gap 
> between waiting for gossip and the first echo response to indicate a node is 
> UP.
> The problem being that do not want to start Native Transport until gossip 
> settles otherwise queries can fail consistency such as LOCAL_QUORUM as it 
> thinks the replicas are still in DOWN state.
> Instead of having to set gossip_settle_min_wait_ms I am proposing that 
> (outside single node cluster) wait for UP message from another node before 
> considering gossip as settled. Eg.
> {code:java}
>             if (currentSize == epSize && currentLive == liveSize && liveSize 
> > 1)
>             {
>                 logger.debug("Gossip looks settled.");
>                 numOkay++;
>             } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints

2023-09-13 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764726#comment-17764726
 ] 

Stefan Miklosovic commented on CASSANDRA-18845:
---

Yeah, like ... if there is 20 nodes, RF is 5 and QUORUM is 3, then "liveSize > 
1" is at least 2. But how do we know that these "2" satisfy _each query on 
local quorum_ ? Maybe there is a query for which quorum requires such nodes 
live which are not detected yet, or maybe I am missing something here.

> Waiting for gossip to settle on live endpoints
> --
>
> Key: CASSANDRA-18845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18845
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cameron Zemek
>Priority: Normal
> Attachments: 18845-3.11.patch, 18845-4.0.patch, 18845-4.1.patch, 
> 18845-5.0.patch
>
>
> This is a follow up to CASSANDRA-18543
> Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms 
> this is tedious and error prone. On a node just observed a 79 second gap 
> between waiting for gossip and the first echo response to indicate a node is 
> UP.
> The problem being that do not want to start Native Transport until gossip 
> settles otherwise queries can fail consistency such as LOCAL_QUORUM as it 
> thinks the replicas are still in DOWN state.
> Instead of having to set gossip_settle_min_wait_ms I am proposing that 
> (outside single node cluster) wait for UP message from another node before 
> considering gossip as settled. Eg.
> {code:java}
>             if (currentSize == epSize && currentLive == liveSize && liveSize 
> > 1)
>             {
>                 logger.debug("Gossip looks settled.");
>                 numOkay++;
>             } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints

2023-09-13 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764724#comment-17764724
 ] 

Brandon Williams commented on CASSANDRA-18845:
--

bq.  that we consider the gossip to be settled as soon as there is more than 1 
live endpoint

That would seem to cause:

bq. do not want to start Native Transport until gossip settles otherwise 
queries can fail consistency such as LOCAL_QUORUM 

> Waiting for gossip to settle on live endpoints
> --
>
> Key: CASSANDRA-18845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18845
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cameron Zemek
>Priority: Normal
> Attachments: 18845-3.11.patch, 18845-4.0.patch, 18845-4.1.patch, 
> 18845-5.0.patch
>
>
> This is a follow up to CASSANDRA-18543
> Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms 
> this is tedious and error prone. On a node just observed a 79 second gap 
> between waiting for gossip and the first echo response to indicate a node is 
> UP.
> The problem being that do not want to start Native Transport until gossip 
> settles otherwise queries can fail consistency such as LOCAL_QUORUM as it 
> thinks the replicas are still in DOWN state.
> Instead of having to set gossip_settle_min_wait_ms I am proposing that 
> (outside single node cluster) wait for UP message from another node before 
> considering gossip as settled. Eg.
> {code:java}
>             if (currentSize == epSize && currentLive == liveSize && liveSize 
> > 1)
>             {
>                 logger.debug("Gossip looks settled.");
>                 numOkay++;
>             } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints

2023-09-13 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764707#comment-17764707
 ] 

Stefan Miklosovic commented on CASSANDRA-18845:
---

I as Iooked at this closer I realized I do not understand it either. 

If we want to add this:
{code:java}
if (currentSize == epSize && currentLive == liveSize && (epSize == liveSize || 
liveSize > 1)) {code}
When it was like this:
{code:java}
if (currentSize == epSize && currentLive == liveSize) {code}
That basically means, if I generalize that, that we consider the gossip to be 
settled as soon as there is more than 1 live endpoints detected? 

So when there is a cluster of 50 nodes, without this change, that "if" would 
return false (or it would not return true fast enough to increment numOkay to 
break from that while) as there would be new endpoints or live members detected 
each round.

But the question is, as Brandon mentioned that, why it takes almost minute and 
a half? 

> Waiting for gossip to settle on live endpoints
> --
>
> Key: CASSANDRA-18845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18845
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cameron Zemek
>Priority: Normal
> Attachments: 18845-3.11.patch, 18845-4.0.patch, 18845-4.1.patch, 
> 18845-5.0.patch
>
>
> This is a follow up to CASSANDRA-18543
> Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms 
> this is tedious and error prone. On a node just observed a 79 second gap 
> between waiting for gossip and the first echo response to indicate a node is 
> UP.
> The problem being that do not want to start Native Transport until gossip 
> settles otherwise queries can fail consistency such as LOCAL_QUORUM as it 
> thinks the replicas are still in DOWN state.
> Instead of having to set gossip_settle_min_wait_ms I am proposing that 
> (outside single node cluster) wait for UP message from another node before 
> considering gossip as settled. Eg.
> {code:java}
>             if (currentSize == epSize && currentLive == liveSize && liveSize 
> > 1)
>             {
>                 logger.debug("Gossip looks settled.");
>                 numOkay++;
>             } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints

2023-09-13 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764688#comment-17764688
 ] 

Brandon Williams commented on CASSANDRA-18845:
--

I'm not sure I understand what the problem is.

bq. On a node just observed a 79 second gap between waiting for gossip and the 
first echo response to indicate a node is UP.

It seems like the reason for this is not in the code.  What made the echo 
response take so long?

> Waiting for gossip to settle on live endpoints
> --
>
> Key: CASSANDRA-18845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18845
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cameron Zemek
>Priority: Normal
> Attachments: 18845-3.11.patch, 18845-4.0.patch, 18845-4.1.patch, 
> 18845-5.0.patch
>
>
> This is a follow up to CASSANDRA-18543
> Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms 
> this is tedious and error prone. On a node just observed a 79 second gap 
> between waiting for gossip and the first echo response to indicate a node is 
> UP.
> The problem being that do not want to start Native Transport until gossip 
> settles otherwise queries can fail consistency such as LOCAL_QUORUM as it 
> thinks the replicas are still in DOWN state.
> Instead of having to set gossip_settle_min_wait_ms I am proposing that 
> (outside single node cluster) wait for UP message from another node before 
> considering gossip as settled. Eg.
> {code:java}
>             if (currentSize == epSize && currentLive == liveSize && liveSize 
> > 1)
>             {
>                 logger.debug("Gossip looks settled.");
>                 numOkay++;
>             } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints

2023-09-13 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764515#comment-17764515
 ] 

Stefan Miklosovic commented on CASSANDRA-18845:
---

I instructed Cameron privately about strong preference for an in-jvm dtest to 
verify and test this behavior. Looking at the test steps described in his 
comment about, it should be rather straightforward to come up with one.

> Waiting for gossip to settle on live endpoints
> --
>
> Key: CASSANDRA-18845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18845
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cameron Zemek
>Priority: Normal
> Attachments: 18845-3.11.patch, 18845-4.0.patch, 18845-4.1.patch, 
> 18845-5.0.patch
>
>
> This is a follow up to CASSANDRA-18543
> Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms 
> this is tedious and error prone. On a node just observed a 79 second gap 
> between waiting for gossip and the first echo response to indicate a node is 
> UP.
> The problem being that do not want to start Native Transport until gossip 
> settles otherwise queries can fail consistency such as LOCAL_QUORUM as it 
> thinks the replicas are still in DOWN state.
> Instead of having to set gossip_settle_min_wait_ms I am proposing that 
> (outside single node cluster) wait for UP message from another node before 
> considering gossip as settled. Eg.
> {code:java}
>             if (currentSize == epSize && currentLive == liveSize && liveSize 
> > 1)
>             {
>                 logger.debug("Gossip looks settled.");
>                 numOkay++;
>             } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints

2023-09-12 Thread Cameron Zemek (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764467#comment-17764467
 ] 

Cameron Zemek commented on CASSANDRA-18845:
---

I have attached patched. Tested this as follows:
 # Spin up single node cluster. Works due to epSize == liveSize check that lets 
it bypass the liveSize > 1 check
 # Spin up 3 node cluster. All 3 nodes start up NTR as expected.
 # Shutdown all nodes. Start up first node it stays waiting in gossip due to 
the liveSize > 1 requirement
 # Start up second node. Now both nodes start NTR since liveSize > 1 and there 
are no other incoming `is now UP` events so gossip looks settled.

> Waiting for gossip to settle on live endpoints
> --
>
> Key: CASSANDRA-18845
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18845
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cameron Zemek
>Priority: Normal
> Attachments: 18845-3.11.patch
>
>
> This is a follow up to CASSANDRA-18543
> Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms 
> this is tedious and error prone. On a node just observed a 79 second gap 
> between waiting for gossip and the first echo response to indicate a node is 
> UP.
> The problem being that do not want to start Native Transport until gossip 
> settles otherwise queries can fail consistency such as LOCAL_QUORUM as it 
> thinks the replicas are still in DOWN state.
> Instead of having to set gossip_settle_min_wait_ms I am proposing that 
> (outside single node cluster) wait for UP message from another node before 
> considering gossip as settled. Eg.
> {code:java}
>             if (currentSize == epSize && currentLive == liveSize && liveSize 
> > 1)
>             {
>                 logger.debug("Gossip looks settled.");
>                 numOkay++;
>             } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org