Paulo Henrique Abreu created CASSANDRA-21095:
------------------------------------------------

             Summary: Paxos V2 emits errors after node decommission
                 Key: CASSANDRA-21095
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21095
             Project: Apache Cassandra
          Issue Type: Bug
            Reporter: Paulo Henrique Abreu


During a controlled node decommission in a Cassandra cluster with Paxos V2 
enabled, the cluster continued to emit Paxos V2–related errors after the node 
had been fully removed from the ring. The errors indicate that Paxos V2 
attempted to reconcile or finalize Paxos state still referencing the 
decommissioned node, even though it was no longer part of the topology.

No immediate data loss was observed, but the behavior caused persistent Paxos 
errors in the logs and represents operational risk for workloads relying on 
LWTs, potentially leading to retries, increased latency, or instability.

As a workaround, Paxos was downgraded from Paxos V2 to Paxos V1 at the cluster 
level. With Paxos V1 enabled, the required node decommission operations 
completed successfully without Paxos-related errors. After the topology changes 
were finalized and the cluster stabilized, Paxos was switched back to Paxos V2.

{{}}


{{Error:}}
{{WARN  [Messaging-EventLoop-3-7] 2025-12-07 03:40:17,548 
OutboundConnection.java:491 - 
/10.10.12.144:7000->/10.10.12.144:7000-SMALL_MESSAGES-61fb9973 dropping message 
of type PAXOS2_CLEANUP_START_PREPARE_REQ due to error}}
{{org.apache.cassandra.net.InvalidSerializedSizeException: Invalid serialized 
size; expected 5312, actual size at least 5311, for verb 
PAXOS2_CLEANUP_START_PREPARE_REQ}}
{{        at 
org.apache.cassandra.net.OutboundConnection$EventLoopDelivery.doRun(OutboundConnection.java:819)}}
{{        at 
org.apache.cassandra.net.OutboundConnection$Delivery.run(OutboundConnection.java:690)}}
{{        at 
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)}}
{{        at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)}}
{{        at 
io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)}}
{{        at 
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)}}
{{        at 
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)}}
{{        at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)}}
{{        at java.lang.Thread.run(Thread.java:748)}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to