Paulo Henrique Abreu created CASSANDRA-21095:
------------------------------------------------
Summary: Paxos V2 emits errors after node decommission
Key: CASSANDRA-21095
URL: https://issues.apache.org/jira/browse/CASSANDRA-21095
Project: Apache Cassandra
Issue Type: Bug
Reporter: Paulo Henrique Abreu
During a controlled node decommission in a Cassandra cluster with Paxos V2
enabled, the cluster continued to emit Paxos V2–related errors after the node
had been fully removed from the ring. The errors indicate that Paxos V2
attempted to reconcile or finalize Paxos state still referencing the
decommissioned node, even though it was no longer part of the topology.
No immediate data loss was observed, but the behavior caused persistent Paxos
errors in the logs and represents operational risk for workloads relying on
LWTs, potentially leading to retries, increased latency, or instability.
As a workaround, Paxos was downgraded from Paxos V2 to Paxos V1 at the cluster
level. With Paxos V1 enabled, the required node decommission operations
completed successfully without Paxos-related errors. After the topology changes
were finalized and the cluster stabilized, Paxos was switched back to Paxos V2.
{{}}
{{Error:}}
{{WARN [Messaging-EventLoop-3-7] 2025-12-07 03:40:17,548
OutboundConnection.java:491 -
/10.10.12.144:7000->/10.10.12.144:7000-SMALL_MESSAGES-61fb9973 dropping message
of type PAXOS2_CLEANUP_START_PREPARE_REQ due to error}}
{{org.apache.cassandra.net.InvalidSerializedSizeException: Invalid serialized
size; expected 5312, actual size at least 5311, for verb
PAXOS2_CLEANUP_START_PREPARE_REQ}}
{{ at
org.apache.cassandra.net.OutboundConnection$EventLoopDelivery.doRun(OutboundConnection.java:819)}}
{{ at
org.apache.cassandra.net.OutboundConnection$Delivery.run(OutboundConnection.java:690)}}
{{ at
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)}}
{{ at
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)}}
{{ at
io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)}}
{{ at
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)}}
{{ at
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)}}
{{ at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)}}
{{ at java.lang.Thread.run(Thread.java:748)}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]