[ 
https://issues.apache.org/jira/browse/CASSANDRA-21132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18053970#comment-18053970
 ] 

Caleb Rackliffe commented on CASSANDRA-21132:
---------------------------------------------

There's a possibility that this is a gossip bug that affects non-rolling 
restarts. Does it reproduce only for full cluster bring-down/bring-up?

In either case, partly because I think it might be valuable independent of this 
problem, I think a system property/YAML option to force the new format would be 
the immediate solution. (Then a possible fix for gossip in a separate 
issue/Jira.)

> Cassandra 5.0.4 startup deadlock: gossip uses pre-5.0.3 encoding due to 
> version negotiation, causing oversized SAI index-status payload assertion
> -------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-21132
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21132
>             Project: Apache Cassandra
>          Issue Type: Bug
>            Reporter: N. Amon
>            Assignee: Caleb Rackliffe
>            Priority: Normal
>
> We appear to be encountering a startup deadlock in Cassandra 5.0.4 where a 
> homogeneous 5.0.4 cluster cannot fully start because gossip falls back to the 
> pre-5.0.3 (uncompressed) index-status encoding, even though all nodes are 
> running >= 5.0.4.
> The failure occurs during gossip serialization of SAI index status and 
> prevents the cluster from ever reaching a state where the 5.0.3+ compressed 
> gossip encoding can be enabled. This appears to be a bootstrap ordering / 
> feature-gating issue rather than a misconfiguration.
> h2. Environment
>  * Cassandra version: 5.0.4 on all nodes (I've tested with 5.0.6 also)
>  * Cluster size: 3 nodes (all same version)
>  * Large number of keyspaces, tables, and SAI indexes
>  * No mixed versions and no rolling upgrade in progress
> h2. Observed behavior
> During startup, nodes fail with an assertion in the gossip thread:
> ERROR [GossipStage:1] 2025-12-22 17:20:10,365 JVMStabilityInspector.java:70 - 
> Exception in thread Thread[GossipStage:1,5,GossipStage]
> java.lang.RuntimeException: java.lang.AssertionError
>         at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:108)
>         at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45)
>         at 
> org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:430)
>         at 
> org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>         at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>         at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.lang.AssertionError: null
>         at org.apache.cassandra.db.TypeSizes.sizeof(TypeSizes.java:44)
>         at 
> org.apache.cassandra.gms.VersionedValue$VersionedValueSerializer.serializedSize(VersionedValue.java:381)
>         at 
> org.apache.cassandra.gms.VersionedValue$VersionedValueSerializer.serializedSize(VersionedValue.java:359)
>         at 
> org.apache.cassandra.gms.EndpointStateSerializer.serializedSize(EndpointState.java:344)
>         at 
> org.apache.cassandra.gms.EndpointStateSerializer.serializedSize(EndpointState.java:300)
>         at 
> org.apache.cassandra.gms.GossipDigestAckSerializer.serializedSize(GossipDigestAck.java:96)
>         at 
> org.apache.cassandra.gms.GossipDigestAckSerializer.serializedSize(GossipDigestAck.java:61)
>         at 
> org.apache.cassandra.net.Message$Serializer.payloadSize(Message.java:1088)
>         at org.apache.cassandra.net.Message.payloadSize(Message.java:1131)
>         at 
> org.apache.cassandra.net.Message$Serializer.serializedSize(Message.java:769)
>         at org.apache.cassandra.net.Message.serializedSize(Message.java:1111)
>         at 
> org.apache.cassandra.net.OutboundConnections.connectionTypeFor(OutboundConnections.java:215)
>         at 
> org.apache.cassandra.net.OutboundConnections.connectionFor(OutboundConnections.java:207)
>         at 
> org.apache.cassandra.net.OutboundConnections.enqueue(OutboundConnections.java:96)
>         at 
> org.apache.cassandra.net.MessagingService.doSend(MessagingService.java:473)
>         at org.apache.cassandra.net.OutboundSink.accept(OutboundSink.java:70)
>         at 
> org.apache.cassandra.net.MessagingService.send(MessagingService.java:462)
>         at 
> org.apache.cassandra.net.MessagingService.send(MessagingService.java:437)
>         at 
> org.apache.cassandra.gms.GossipDigestSynVerbHandler.doVerb(GossipDigestSynVerbHandler.java:110)
>         at 
> org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:78)
>         at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:97)
>         ... 7 common frames omitted
> This failure occurs whether the cluster is started via a rolling startup or 
> complete bring-down/bring-up.
> The cluster contains approximately six large application keyspaces. These 
> keyspaces were originally created sequentially without issue. My 
> understanding is that this succeeds because when the cluster initially starts 
> without application keyspaces, gossip converges successfully and the 5.0.3+ 
> compressed index-status format (introduced in CASSANDRA-20058) becomes active.
> I can confirm this by running:
> nodetool gossipinfo
> and observing that the newer format is used (keyspace names are not 
> duplicated per index and index status is encoded numerically rather than as 
> strings).
> After restarting all nodes in the cluster:
>  * Only the first node starts successfully
>  * Subsequent nodes fail during gossip with the assertion above
>  * Running nodetool gossipinfo again shows that the index-status gossip 
> format has reverted to the pre-5.0.3 representation, with duplicated keyspace 
> names and string status values
> h2. Relevant code path
> Reviewing the 5.0.3 fix (commit 376fe2a9fe3f13c7555c40cda6d3912d55ef63cc), 
> the serialization format appears to be gated on the minimum Cassandra version 
> observed via gossip:
> // Versions 5.0.0 through 5.0.2 use a much more bloated format that 
> duplicates keyspace names
> // and writes full status names instead of their numeric codes. If the 
> minimum cluster version is
> // unknown or one of those 3 versions, continue to propagate the old format.
> CassandraVersion minVersion = Gossiper.instance.getMinVersion(1, 
> TimeUnit.SECONDS);
> String newSerializedStatusMap =
>     shouldWriteLegacyStatusFormat(minVersion)
>         ? JsonUtils.writeAsJsonString(statusMap)
>         : toSerializedFormat(statusMap);
> h2. Hypothesis
> The working theory is that {{getMinVersion()}} itself depends on gossip 
> convergence during startup. Until all nodes have successfully joined and 
> advertised their {{{}RELEASE_VERSION{}}}, the minimum cluster version is 
> treated as unknown or conservatively low. This causes Cassandra to fall back 
> to the legacy (pre-5.0.3) index-status serialization, which in our case 
> produces a gossip payload large enough to trigger the assertion.
> As a result:
>  * Nodes cannot join successfully
>  * Gossip never converges
>  * The newer compressed encoding is never enabled
>  * The cluster enters a startup deadlock
> h2. Questions
>  * Is this understanding of the startup/version-gating behavior correct?
>  * Is this a known limitation or bug in 5.0.x?
>  * Are there recommended workarounds to bootstrap a homogeneous cluster in 
> this state (short of migrating keyspaces to another cluster)?
> h2. Update
> I tested the hypothesis by hacking the 
> [fix|https://github.com/apache/cassandra/commit/376fe2a9fe3f13c7555c40cda6d3912d55ef63cc]
>  for the issue released by the Cassandra team in 5.0.3, specifically this 
> method:
> private static boolean shouldWriteLegacyStatusFormat(CassandraVersion 
> minVersion)
>     \{
>         return minVersion == null || (minVersion.major == 5 && 
> minVersion.minor == 0 && minVersion.patch < 3);
>     }
>  
> To always return {{false}} so the new, more compact serialization for the 
> Index Summary is activated before the cluster stabilizes. With this change, 
> {*}the cluster started successfully{*}. This is a hack, I'm hoping someone 
> from the Cassandra team reads this and provides an official fix.
>  
> {{}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to