[D] Kvrocks gracefully failover design proposal [kvrocks]

via GitHub Sat, 18 Oct 2025 06:05:16 -0700


GitHub user yuzegao created a discussion: Kvrocks gracefully failover design 
proposal


Hi All, 
In [#2848](https://github.com/apache/kvrocks/issues/2848), We plan to support 
the gracefully(No-data-loss) failover command.
### Summary
Kvrocks today lacks a production-ready failover workflow that guarantees no 
data loss during manual promote master. This proposal compares two feasible 
approaches:
1. In-process failover - Extends Kvrocks core to perform coordinated failover 
(master suspends writes, master and slave roles reverse, master releases 
writes).
2. Controller-based failover - Implement an external HA controller 
(kvrocks-controller) that uses native kvrocks commands (INFO, CLIENT PAUSE, 
CLUSTERX SETNODES, etc.) to coordinate safe upgrades. Incidentally, CLIENT 
PAUSE WRITE/CLIENT UNPAUSE needs to be implemented.

Both approaches are viable. This document outlines the design, tradeoffs,  
hoping to spark community discussion on which approach to adopt. In addition, 
this solution does not consider the processing of non-cluster mode.

### Goals
Primary goal: Provide a failover mechanism to prevent data loss during manual 
maintenance (node migration, process upgrade).

### Option A — In-process (Node-local) Failover
#### Concept

Enhance the Kvrocks server binary so that a given slave within a shard 
cooperates with the master to complete the master-slave reversal steps: 
checking the master-slave replication offset, pausing the master from writing, 
catching up the master and slave offsets, reversing the master-slave roles, and 
releasing the old master from writing.
<img width="1636" height="1396" alt="image" 
src="https://github.com/user-attachments/assets/4775ef85-e9e4-4a15-a6c9-48124dc50934";
 />
#### Core Components/Steps

1. The replica node receives the FAILOVER command and notifies the master node 
to initiate a master-slave reversal.
2. The master node checks the slave node's offset, suspends write operations, 
and, upon confirming that the slave node's offset is the same, updates the 
node's role to slave.
3. The slave node is notified to switch to the master role and take over the 
slots. If this step fails, the second step of the role change is rolled back.
4. Write operations are resumed, and all subsequent read and write requests 
received by the old master are forwarded to the new master node to ensure data 
consistency.
5. After the controller confirms the failover status is successful, it executes 
clusterx setnodes to update the cluster topology.

#### Advantages

1. Autonomous strategy: Fewer interactive steps means less risk of anomalies.
2. Lowest latency: Fewer round trips to the external controller.

#### Disadvantages/Risks

1. Largely invasive changes to the core; increased code complexity and testing 
interface.
2. Operational debugging and observability become more difficult (failure logic 
within the process).

### Option B — Controller-Based Failover
#### Concept

An external, highly available kvrocks controller is responsible for 
master-slave rollovers and cluster topology updates. The controller detects 
replication offsets between master and slave nodes, pauses writes to the 
master, updates the master and slave roles and cluster topology, and resumes 
writes to the old master. The controller also performs trade-off procedures for 
exceptions such as retries and rollbacks.
<img width="1774" height="1554" alt="image" 
src="https://github.com/user-attachments/assets/40743437-e9cf-446f-bcd9-75439744e8d1";
 />
#### Core Components/Steps

1. The controller checks the replication lag between the target slave and the 
master (Info replication).
2. If the lag is manageable, the controller pauses writes to the master (CLIENT 
PAUSE WRITE needs to be implemented).
3. When the replication lag between the master and slave nodes is consistent, 
the controller performs a slave role reversal and a topology update (CLUSTERX 
SETNODES).
4. Performs a master role reversal and a topology update.
5. Resumes the paused writes on the old master (CLIENT UNPAUSE needs to be 
implemented).
6. Continues to update the topology information of other nodes in the cluster.

#### Advantages

1. Operationally friendly: The controller is auditable and versioned, making it 
easier to iterate.
2. Low risk: Changes to kvrocks are small and testable.

#### Disadvantages/Tradeoffs

1. The controller and kvrocks require multiple rounds of interaction, requiring 
handling of multiple exceptions (retries and rollbacks).
2. Slightly higher failover latency (controller coordination); this is 
acceptable in most environments. 3. Controllers must be highly reliable – 
becoming critical infrastructure (but easier to monitor and maintain than core 
logic).


GitHub link: https://github.com/apache/kvrocks/discussions/3218

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

[D] Kvrocks gracefully failover design proposal [kvrocks]

Reply via email to