N. Amon created CASSANDRA-21132:
-----------------------------------

             Summary: Cassandra 5.0.4 startup deadlock: gossip uses pre-5.0.3 
encoding due to version negotiation, causing oversized SAI index-status payload 
assertion
                 Key: CASSANDRA-21132
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21132
             Project: Apache Cassandra
          Issue Type: Bug
            Reporter: N. Amon
            Assignee: Caleb Rackliffe


We appear to be encountering a startup deadlock in Cassandra 5.0.4 where a 
homogeneous 5.0.4 cluster cannot fully start because gossip falls back to the 
pre-5.0.3 (uncompressed) index-status encoding, even though all nodes are 
running >= 5.0.4.

The failure occurs during gossip serialization of SAI index status and prevents 
the cluster from ever reaching a state where the 5.0.3+ compressed gossip 
encoding can be enabled. This appears to be a bootstrap ordering / 
feature-gating issue rather than a misconfiguration.
h2. Environment
 * Cassandra version: 5.0.4 on all nodes (I've tested with 5.0.6 also)
 * Cluster size: 3 nodes (all same version)
 * Large number of keyspaces, tables, and SAI indexes
 * No mixed versions and no rolling upgrade in progress

h2. Observed behavior

During startup, nodes fail with an assertion in the gossip thread:
ERROR [GossipStage:1] 2025-12-22 17:20:10,365 JVMStabilityInspector.java:70 - 
Exception in thread Thread[GossipStage:1,5,GossipStage]
java.lang.RuntimeException: java.lang.AssertionError
        at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:108)
        at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45)
        at 
org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:430)
        at 
org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.AssertionError: null
        at org.apache.cassandra.db.TypeSizes.sizeof(TypeSizes.java:44)
        at 
org.apache.cassandra.gms.VersionedValue$VersionedValueSerializer.serializedSize(VersionedValue.java:381)
        at 
org.apache.cassandra.gms.VersionedValue$VersionedValueSerializer.serializedSize(VersionedValue.java:359)
        at 
org.apache.cassandra.gms.EndpointStateSerializer.serializedSize(EndpointState.java:344)
        at 
org.apache.cassandra.gms.EndpointStateSerializer.serializedSize(EndpointState.java:300)
        at 
org.apache.cassandra.gms.GossipDigestAckSerializer.serializedSize(GossipDigestAck.java:96)
        at 
org.apache.cassandra.gms.GossipDigestAckSerializer.serializedSize(GossipDigestAck.java:61)
        at 
org.apache.cassandra.net.Message$Serializer.payloadSize(Message.java:1088)
        at org.apache.cassandra.net.Message.payloadSize(Message.java:1131)
        at 
org.apache.cassandra.net.Message$Serializer.serializedSize(Message.java:769)
        at org.apache.cassandra.net.Message.serializedSize(Message.java:1111)
        at 
org.apache.cassandra.net.OutboundConnections.connectionTypeFor(OutboundConnections.java:215)
        at 
org.apache.cassandra.net.OutboundConnections.connectionFor(OutboundConnections.java:207)
        at 
org.apache.cassandra.net.OutboundConnections.enqueue(OutboundConnections.java:96)
        at 
org.apache.cassandra.net.MessagingService.doSend(MessagingService.java:473)
        at org.apache.cassandra.net.OutboundSink.accept(OutboundSink.java:70)
        at 
org.apache.cassandra.net.MessagingService.send(MessagingService.java:462)
        at 
org.apache.cassandra.net.MessagingService.send(MessagingService.java:437)
        at 
org.apache.cassandra.gms.GossipDigestSynVerbHandler.doVerb(GossipDigestSynVerbHandler.java:110)
        at 
org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:78)
        at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:97)
        ... 7 common frames omitted
This failure occurs whether the cluster is started via a rolling startup or 
complete bring-down/bring-up.

The cluster contains approximately six large application keyspaces. These 
keyspaces were originally created sequentially without issue. My understanding 
is that this succeeds because when the cluster initially starts without 
application keyspaces, gossip converges successfully and the 5.0.3+ compressed 
index-status format (introduced in CASSANDRA-20058) becomes active.

I can confirm this by running:
nodetool gossipinfo
and observing that the newer format is used (keyspace names are not duplicated 
per index and index status is encoded numerically rather than as strings).

After restarting all nodes in the cluster:
 * Only the first node starts successfully
 * Subsequent nodes fail during gossip with the assertion above
 * Running nodetool gossipinfo again shows that the index-status gossip format 
has reverted to the pre-5.0.3 representation, with duplicated keyspace names 
and string status values

h2. Relevant code path

Reviewing the 5.0.3 fix (commit 376fe2a9fe3f13c7555c40cda6d3912d55ef63cc), the 
serialization format appears to be gated on the minimum Cassandra version 
observed via gossip:
// Versions 5.0.0 through 5.0.2 use a much more bloated format that duplicates 
keyspace names
// and writes full status names instead of their numeric codes. If the minimum 
cluster version is
// unknown or one of those 3 versions, continue to propagate the old format.
CassandraVersion minVersion = Gossiper.instance.getMinVersion(1, 
TimeUnit.SECONDS);
String newSerializedStatusMap =
    shouldWriteLegacyStatusFormat(minVersion)
        ? JsonUtils.writeAsJsonString(statusMap)
        : toSerializedFormat(statusMap);
h2. Hypothesis

The working theory is that {{getMinVersion()}} itself depends on gossip 
convergence during startup. Until all nodes have successfully joined and 
advertised their {{{}RELEASE_VERSION{}}}, the minimum cluster version is 
treated as unknown or conservatively low. This causes Cassandra to fall back to 
the legacy (pre-5.0.3) index-status serialization, which in our case produces a 
gossip payload large enough to trigger the assertion.

As a result:
 * Nodes cannot join successfully
 * Gossip never converges
 * The newer compressed encoding is never enabled
 * The cluster enters a startup deadlock

h2. Questions
 * Is this understanding of the startup/version-gating behavior correct?
 * Is this a known limitation or bug in 5.0.x?
 * Are there recommended workarounds to bootstrap a homogeneous cluster in this 
state (short of migrating keyspaces to another cluster)?

h2. Update

I tested the hypothesis by hacking the 
[fix|https://github.com/apache/cassandra/commit/376fe2a9fe3f13c7555c40cda6d3912d55ef63cc]
 for the issue released by the Cassandra team in 5.0.3, specifically this 
method:
private static boolean shouldWriteLegacyStatusFormat(CassandraVersion 
minVersion)
    \{
        return minVersion == null || (minVersion.major == 5 && minVersion.minor 
== 0 && minVersion.patch < 3);
    }
 

To always return {{false}} so the new, more compact serialization for the Index 
Summary is activated before the cluster stabilizes. With this change, {*}the 
cluster started successfully{*}. This is a hack, I'm hoping someone from the 
Cassandra team reads this and provides an official fix.
 
{{}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to