[ 
https://issues.apache.org/jira/browse/CASSANDRA-18968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784212#comment-17784212
 ] 

Paulo Motta commented on CASSANDRA-18968:
-----------------------------------------

It seems like this issue was raised by [~aleksey] on CASSANDRA-13993:
{quote}As implemented currently, we are going to send PINGs potentially to 
3.11/3.0 - unless we switch to gating by version, which we do sometimes.
{quote}
{quote}So I was thinking about a major upgrade bounce scenario. Think the first 
ever node to upgrade to 4.0 in a cluster of 3.0 nodes - will send out pings to 
every node, but receive no pongs, correct? So every node until a threshold will 
have a significantly longer bounce. Do we care about this case?
{quote}
Which was replied by [~jasobrown] with:
{quote}So here's the rub: we don't necessarily know the peer's version yet. The 
ping messages are sent on the large/small connections, but we're not guaranteed 
that at least one round of gossip has completed wherein we would learn the 
version of the peers (we're still at in the startup process).
{quote}
However I don't think this is a problem since we [wait for gossip to 
settle|https://github.com/apache/cassandra/blob/7b891db36d4bcfa116ee04e3f4b3f31af798d5b2/src/java/org/apache/cassandra/service/CassandraDaemon.java#L401]
 before executing this check? Can you confirm this [~brandon.williams] ?

The worst that can happen if the version of a peer is unknown is to 
unnecessarily execute this check which will just fallback to the current 
behavior which is not a big deal IMO - it will just make startup slightly 
slower and log a warning.

Confirmed that when upgrading a cluster from 3.11 to 4.1 the following message 
is print on the debug.log for all except the last node:
{noformat}
DEBUG [main] 2023-11-08 21:08:42,056 StartupClusterConnectivityChecker.java:97 
- Skipping startup connectivity check as some nodes may be running Cassandra 
version 3 or older which does not support connectivity checking.
{noformat}
In the last node to be upgraded the check is executed as expected:
{noformat}
INFO  [main] 2023-11-08 21:35:30,387 StartupClusterConnectivityChecker.java:128 
- Blocking coordination until only a single peer is DOWN in the local 
datacenter, timeout=10s
INFO  [main] 2023-11-08 21:35:30,453 StartupClusterConnectivityChecker.java:181 
- Ensured sufficient healthy connections with [DC1] after 63 milliseconds
{noformat}
{quote}+1, I am waiting for another formal +1 and we can ship this!
{quote}
LGTM, would you like to commit this [~smiklosovic] ? If so perhaps update the 
CHANGES.txt and commit message to "Skip connectivity check when upgrading from 
3.X".

> StartupClusterConnectivityChecker fails on upgrade from 3.X
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-18968
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18968
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Startup and Shutdown
>            Reporter: Paulo Motta
>            Assignee: Isaac Reath
>            Priority: Normal
>              Labels: lhf
>             Fix For: 4.0.x, 4.1.x
>
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Starting up a new 4.X node on a 3.x cluster throws the following warning:
> {noformat}
> WARN  [main] 2023-10-27 15:58:22,234 
> StartupClusterConnectivityChecker.java:183 - Timed out after 10002 
> milliseconds, was waiting for remaining peers to connect: {dc1=[X.Y.Z.W, 
> A.B.C.D]}
> {noformat}
> I think this is because the PING messages used by the startup check are not 
> available on 3.X.
> To provide a smoother upgrade experience we should probably disable this 
> check on a mixed version clusters, or skip peers on versions < 4.x when doing 
> the connectivity check.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to