Hi everyone, I’d like to start a discussion about adjusting how Cassandra calculates blockFor during node replacements. The JIRA tracking this proposal is here: https://issues.apache.org/jira/browse/CASSANDRA-20993 Problem Background
Today, during a replacement, the pending replica is always included when determining the required acknowledgments. For example, with RF=3 and LOCAL_QUORUM, the coordinator waits for three responses instead of two. Since replacement nodes are often bootstrapping and slow to respond, this can result in write timeouts or increased write latency—even though the client only requested acknowledgments from the natural replicas. This behavior effectively breaks the client contract by requiring more responses than the specified consistency level. Proposed Change For replacement scenarios only, exclude pending replicas from blockFor and require acknowledgments solely from natural replicas. Pending nodes will still receive writes, but their responses will not count toward satisfying the consistency level. Responses from the node being replaced would also be ignored. Although it is uncommon for a replaced node to become reachable again, adding this safeguard avoids ambiguity and ensures correctness if that situation occurs. This change would be disabled by default and controlled via a feature flag to avoid affecting existing deployments. In my view, this behavior is effectively a bug because the coordinator waits for more acknowledgments than the client requested, leading to avoidable failures or latency. Since the issue affects correctness from the client perspective rather than introducing new semantics, it would be valuable to include this fix in the 4.x branches as well, with the behavior disabled by default where needed. Motivation This change: - Prevents unnecessary write timeouts during replacements - Reduces write latency by eliminating dependence on a busy pending replica - Aligns server behavior with client expectations Current Status A PR for 4.1 is available here for review: https://github.com/apache/cassandra/pull/4494 Feedback is welcome on both the implementation and the approach. Next Steps I’d appreciate input on: 1. Any correctness concerns for replacement scenarios 2. Whether a feature-flagged approach is acceptable Thanks in advance for your feedback, Runtian
