N. Amon created CASSANDRA-21132:
-----------------------------------
Summary: Cassandra 5.0.4 startup deadlock: gossip uses pre-5.0.3
encoding due to version negotiation, causing oversized SAI index-status payload
assertion
Key: CASSANDRA-21132
URL: https://issues.apache.org/jira/browse/CASSANDRA-21132
Project: Apache Cassandra
Issue Type: Bug
Reporter: N. Amon
Assignee: Caleb Rackliffe
We appear to be encountering a startup deadlock in Cassandra 5.0.4 where a
homogeneous 5.0.4 cluster cannot fully start because gossip falls back to the
pre-5.0.3 (uncompressed) index-status encoding, even though all nodes are
running >= 5.0.4.
The failure occurs during gossip serialization of SAI index status and prevents
the cluster from ever reaching a state where the 5.0.3+ compressed gossip
encoding can be enabled. This appears to be a bootstrap ordering /
feature-gating issue rather than a misconfiguration.
h2. Environment
* Cassandra version: 5.0.4 on all nodes (I've tested with 5.0.6 also)
* Cluster size: 3 nodes (all same version)
* Large number of keyspaces, tables, and SAI indexes
* No mixed versions and no rolling upgrade in progress
h2. Observed behavior
During startup, nodes fail with an assertion in the gossip thread:
ERROR [GossipStage:1] 2025-12-22 17:20:10,365 JVMStabilityInspector.java:70 -
Exception in thread Thread[GossipStage:1,5,GossipStage]
java.lang.RuntimeException: java.lang.AssertionError
at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:108)
at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45)
at
org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:430)
at
org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.AssertionError: null
at org.apache.cassandra.db.TypeSizes.sizeof(TypeSizes.java:44)
at
org.apache.cassandra.gms.VersionedValue$VersionedValueSerializer.serializedSize(VersionedValue.java:381)
at
org.apache.cassandra.gms.VersionedValue$VersionedValueSerializer.serializedSize(VersionedValue.java:359)
at
org.apache.cassandra.gms.EndpointStateSerializer.serializedSize(EndpointState.java:344)
at
org.apache.cassandra.gms.EndpointStateSerializer.serializedSize(EndpointState.java:300)
at
org.apache.cassandra.gms.GossipDigestAckSerializer.serializedSize(GossipDigestAck.java:96)
at
org.apache.cassandra.gms.GossipDigestAckSerializer.serializedSize(GossipDigestAck.java:61)
at
org.apache.cassandra.net.Message$Serializer.payloadSize(Message.java:1088)
at org.apache.cassandra.net.Message.payloadSize(Message.java:1131)
at
org.apache.cassandra.net.Message$Serializer.serializedSize(Message.java:769)
at org.apache.cassandra.net.Message.serializedSize(Message.java:1111)
at
org.apache.cassandra.net.OutboundConnections.connectionTypeFor(OutboundConnections.java:215)
at
org.apache.cassandra.net.OutboundConnections.connectionFor(OutboundConnections.java:207)
at
org.apache.cassandra.net.OutboundConnections.enqueue(OutboundConnections.java:96)
at
org.apache.cassandra.net.MessagingService.doSend(MessagingService.java:473)
at org.apache.cassandra.net.OutboundSink.accept(OutboundSink.java:70)
at
org.apache.cassandra.net.MessagingService.send(MessagingService.java:462)
at
org.apache.cassandra.net.MessagingService.send(MessagingService.java:437)
at
org.apache.cassandra.gms.GossipDigestSynVerbHandler.doVerb(GossipDigestSynVerbHandler.java:110)
at
org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:78)
at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:97)
... 7 common frames omitted
This failure occurs whether the cluster is started via a rolling startup or
complete bring-down/bring-up.
The cluster contains approximately six large application keyspaces. These
keyspaces were originally created sequentially without issue. My understanding
is that this succeeds because when the cluster initially starts without
application keyspaces, gossip converges successfully and the 5.0.3+ compressed
index-status format (introduced in CASSANDRA-20058) becomes active.
I can confirm this by running:
nodetool gossipinfo
and observing that the newer format is used (keyspace names are not duplicated
per index and index status is encoded numerically rather than as strings).
After restarting all nodes in the cluster:
* Only the first node starts successfully
* Subsequent nodes fail during gossip with the assertion above
* Running nodetool gossipinfo again shows that the index-status gossip format
has reverted to the pre-5.0.3 representation, with duplicated keyspace names
and string status values
h2. Relevant code path
Reviewing the 5.0.3 fix (commit 376fe2a9fe3f13c7555c40cda6d3912d55ef63cc), the
serialization format appears to be gated on the minimum Cassandra version
observed via gossip:
// Versions 5.0.0 through 5.0.2 use a much more bloated format that duplicates
keyspace names
// and writes full status names instead of their numeric codes. If the minimum
cluster version is
// unknown or one of those 3 versions, continue to propagate the old format.
CassandraVersion minVersion = Gossiper.instance.getMinVersion(1,
TimeUnit.SECONDS);
String newSerializedStatusMap =
shouldWriteLegacyStatusFormat(minVersion)
? JsonUtils.writeAsJsonString(statusMap)
: toSerializedFormat(statusMap);
h2. Hypothesis
The working theory is that {{getMinVersion()}} itself depends on gossip
convergence during startup. Until all nodes have successfully joined and
advertised their {{{}RELEASE_VERSION{}}}, the minimum cluster version is
treated as unknown or conservatively low. This causes Cassandra to fall back to
the legacy (pre-5.0.3) index-status serialization, which in our case produces a
gossip payload large enough to trigger the assertion.
As a result:
* Nodes cannot join successfully
* Gossip never converges
* The newer compressed encoding is never enabled
* The cluster enters a startup deadlock
h2. Questions
* Is this understanding of the startup/version-gating behavior correct?
* Is this a known limitation or bug in 5.0.x?
* Are there recommended workarounds to bootstrap a homogeneous cluster in this
state (short of migrating keyspaces to another cluster)?
h2. Update
I tested the hypothesis by hacking the
[fix|https://github.com/apache/cassandra/commit/376fe2a9fe3f13c7555c40cda6d3912d55ef63cc]
for the issue released by the Cassandra team in 5.0.3, specifically this
method:
private static boolean shouldWriteLegacyStatusFormat(CassandraVersion
minVersion)
\{
return minVersion == null || (minVersion.major == 5 && minVersion.minor
== 0 && minVersion.patch < 3);
}
To always return {{false}} so the new, more compact serialization for the Index
Summary is activated before the cluster stabilizes. With this change, {*}the
cluster started successfully{*}. This is a hack, I'm hoping someone from the
Cassandra team reads this and provides an official fix.
{{}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]