[
https://issues.apache.org/jira/browse/CASSANDRA-21132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18053970#comment-18053970
]
Caleb Rackliffe commented on CASSANDRA-21132:
---------------------------------------------
There's a possibility that this is a gossip bug that affects non-rolling
restarts. Does it reproduce only for full cluster bring-down/bring-up?
In either case, partly because I think it might be valuable independent of this
problem, I think a system property/YAML option to force the new format would be
the immediate solution. (Then a possible fix for gossip in a separate
issue/Jira.)
> Cassandra 5.0.4 startup deadlock: gossip uses pre-5.0.3 encoding due to
> version negotiation, causing oversized SAI index-status payload assertion
> -------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-21132
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21132
> Project: Apache Cassandra
> Issue Type: Bug
> Reporter: N. Amon
> Assignee: Caleb Rackliffe
> Priority: Normal
>
> We appear to be encountering a startup deadlock in Cassandra 5.0.4 where a
> homogeneous 5.0.4 cluster cannot fully start because gossip falls back to the
> pre-5.0.3 (uncompressed) index-status encoding, even though all nodes are
> running >= 5.0.4.
> The failure occurs during gossip serialization of SAI index status and
> prevents the cluster from ever reaching a state where the 5.0.3+ compressed
> gossip encoding can be enabled. This appears to be a bootstrap ordering /
> feature-gating issue rather than a misconfiguration.
> h2. Environment
> * Cassandra version: 5.0.4 on all nodes (I've tested with 5.0.6 also)
> * Cluster size: 3 nodes (all same version)
> * Large number of keyspaces, tables, and SAI indexes
> * No mixed versions and no rolling upgrade in progress
> h2. Observed behavior
> During startup, nodes fail with an assertion in the gossip thread:
> ERROR [GossipStage:1] 2025-12-22 17:20:10,365 JVMStabilityInspector.java:70 -
> Exception in thread Thread[GossipStage:1,5,GossipStage]
> java.lang.RuntimeException: java.lang.AssertionError
> at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:108)
> at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45)
> at
> org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:430)
> at
> org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.lang.AssertionError: null
> at org.apache.cassandra.db.TypeSizes.sizeof(TypeSizes.java:44)
> at
> org.apache.cassandra.gms.VersionedValue$VersionedValueSerializer.serializedSize(VersionedValue.java:381)
> at
> org.apache.cassandra.gms.VersionedValue$VersionedValueSerializer.serializedSize(VersionedValue.java:359)
> at
> org.apache.cassandra.gms.EndpointStateSerializer.serializedSize(EndpointState.java:344)
> at
> org.apache.cassandra.gms.EndpointStateSerializer.serializedSize(EndpointState.java:300)
> at
> org.apache.cassandra.gms.GossipDigestAckSerializer.serializedSize(GossipDigestAck.java:96)
> at
> org.apache.cassandra.gms.GossipDigestAckSerializer.serializedSize(GossipDigestAck.java:61)
> at
> org.apache.cassandra.net.Message$Serializer.payloadSize(Message.java:1088)
> at org.apache.cassandra.net.Message.payloadSize(Message.java:1131)
> at
> org.apache.cassandra.net.Message$Serializer.serializedSize(Message.java:769)
> at org.apache.cassandra.net.Message.serializedSize(Message.java:1111)
> at
> org.apache.cassandra.net.OutboundConnections.connectionTypeFor(OutboundConnections.java:215)
> at
> org.apache.cassandra.net.OutboundConnections.connectionFor(OutboundConnections.java:207)
> at
> org.apache.cassandra.net.OutboundConnections.enqueue(OutboundConnections.java:96)
> at
> org.apache.cassandra.net.MessagingService.doSend(MessagingService.java:473)
> at org.apache.cassandra.net.OutboundSink.accept(OutboundSink.java:70)
> at
> org.apache.cassandra.net.MessagingService.send(MessagingService.java:462)
> at
> org.apache.cassandra.net.MessagingService.send(MessagingService.java:437)
> at
> org.apache.cassandra.gms.GossipDigestSynVerbHandler.doVerb(GossipDigestSynVerbHandler.java:110)
> at
> org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:78)
> at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:97)
> ... 7 common frames omitted
> This failure occurs whether the cluster is started via a rolling startup or
> complete bring-down/bring-up.
> The cluster contains approximately six large application keyspaces. These
> keyspaces were originally created sequentially without issue. My
> understanding is that this succeeds because when the cluster initially starts
> without application keyspaces, gossip converges successfully and the 5.0.3+
> compressed index-status format (introduced in CASSANDRA-20058) becomes active.
> I can confirm this by running:
> nodetool gossipinfo
> and observing that the newer format is used (keyspace names are not
> duplicated per index and index status is encoded numerically rather than as
> strings).
> After restarting all nodes in the cluster:
> * Only the first node starts successfully
> * Subsequent nodes fail during gossip with the assertion above
> * Running nodetool gossipinfo again shows that the index-status gossip
> format has reverted to the pre-5.0.3 representation, with duplicated keyspace
> names and string status values
> h2. Relevant code path
> Reviewing the 5.0.3 fix (commit 376fe2a9fe3f13c7555c40cda6d3912d55ef63cc),
> the serialization format appears to be gated on the minimum Cassandra version
> observed via gossip:
> // Versions 5.0.0 through 5.0.2 use a much more bloated format that
> duplicates keyspace names
> // and writes full status names instead of their numeric codes. If the
> minimum cluster version is
> // unknown or one of those 3 versions, continue to propagate the old format.
> CassandraVersion minVersion = Gossiper.instance.getMinVersion(1,
> TimeUnit.SECONDS);
> String newSerializedStatusMap =
> shouldWriteLegacyStatusFormat(minVersion)
> ? JsonUtils.writeAsJsonString(statusMap)
> : toSerializedFormat(statusMap);
> h2. Hypothesis
> The working theory is that {{getMinVersion()}} itself depends on gossip
> convergence during startup. Until all nodes have successfully joined and
> advertised their {{{}RELEASE_VERSION{}}}, the minimum cluster version is
> treated as unknown or conservatively low. This causes Cassandra to fall back
> to the legacy (pre-5.0.3) index-status serialization, which in our case
> produces a gossip payload large enough to trigger the assertion.
> As a result:
> * Nodes cannot join successfully
> * Gossip never converges
> * The newer compressed encoding is never enabled
> * The cluster enters a startup deadlock
> h2. Questions
> * Is this understanding of the startup/version-gating behavior correct?
> * Is this a known limitation or bug in 5.0.x?
> * Are there recommended workarounds to bootstrap a homogeneous cluster in
> this state (short of migrating keyspaces to another cluster)?
> h2. Update
> I tested the hypothesis by hacking the
> [fix|https://github.com/apache/cassandra/commit/376fe2a9fe3f13c7555c40cda6d3912d55ef63cc]
> for the issue released by the Cassandra team in 5.0.3, specifically this
> method:
> private static boolean shouldWriteLegacyStatusFormat(CassandraVersion
> minVersion)
> \{
> return minVersion == null || (minVersion.major == 5 &&
> minVersion.minor == 0 && minVersion.patch < 3);
> }
>
> To always return {{false}} so the new, more compact serialization for the
> Index Summary is activated before the cluster stabilizes. With this change,
> {*}the cluster started successfully{*}. This is a hack, I'm hoping someone
> from the Cassandra team reads this and provides an official fix.
>
> {{}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]