GitHub user yuzegao created a discussion: Kvrocks gracefully failover design proposal
Hi All, In [#2848](https://github.com/apache/kvrocks/issues/2848), We plan to support the gracefully(No-data-loss) failover command. ### Summary Kvrocks today lacks a production-ready failover workflow that guarantees no data loss during manual promote master. This proposal compares two feasible approaches: 1. In-process failover - Extends Kvrocks core to perform coordinated failover (master suspends writes, master and slave roles reverse, master releases writes). 2. Controller-based failover - Implement an external HA controller (kvrocks-controller) that uses native kvrocks commands (INFO, CLIENT PAUSE, CLUSTERX SETNODES, etc.) to coordinate safe upgrades. Incidentally, CLIENT PAUSE WRITE/CLIENT UNPAUSE needs to be implemented. Both approaches are viable. This document outlines the design, tradeoffs, hoping to spark community discussion on which approach to adopt. In addition, this solution does not consider the processing of non-cluster mode. ### Goals Primary goal: Provide a failover mechanism to prevent data loss during manual maintenance (node migration, process upgrade). ### Option A — In-process (Node-local) Failover #### Concept Enhance the Kvrocks server binary so that a given slave within a shard cooperates with the master to complete the master-slave reversal steps: checking the master-slave replication offset, pausing the master from writing, catching up the master and slave offsets, reversing the master-slave roles, and releasing the old master from writing. <img width="1636" height="1396" alt="image" src="https://github.com/user-attachments/assets/4775ef85-e9e4-4a15-a6c9-48124dc50934" /> #### Core Components/Steps 1. The replica node receives the FAILOVER command and notifies the master node to initiate a master-slave reversal. 2. The master node checks the slave node's offset, suspends write operations, and, upon confirming that the slave node's offset is the same, updates the node's role to slave. 3. The slave node is notified to switch to the master role and take over the slots. If this step fails, the second step of the role change is rolled back. 4. Write operations are resumed, and all subsequent read and write requests received by the old master are forwarded to the new master node to ensure data consistency. 5. After the controller confirms the failover status is successful, it executes clusterx setnodes to update the cluster topology. #### Advantages 1. Autonomous strategy: Fewer interactive steps means less risk of anomalies. 2. Lowest latency: Fewer round trips to the external controller. #### Disadvantages/Risks 1. Largely invasive changes to the core; increased code complexity and testing interface. 2. Operational debugging and observability become more difficult (failure logic within the process). ### Option B — Controller-Based Failover #### Concept An external, highly available kvrocks controller is responsible for master-slave rollovers and cluster topology updates. The controller detects replication offsets between master and slave nodes, pauses writes to the master, updates the master and slave roles and cluster topology, and resumes writes to the old master. The controller also performs trade-off procedures for exceptions such as retries and rollbacks. <img width="1774" height="1554" alt="image" src="https://github.com/user-attachments/assets/40743437-e9cf-446f-bcd9-75439744e8d1" /> #### Core Components/Steps 1. The controller checks the replication lag between the target slave and the master (Info replication). 2. If the lag is manageable, the controller pauses writes to the master (CLIENT PAUSE WRITE needs to be implemented). 3. When the replication lag between the master and slave nodes is consistent, the controller performs a slave role reversal and a topology update (CLUSTERX SETNODES). 4. Performs a master role reversal and a topology update. 5. Resumes the paused writes on the old master (CLIENT UNPAUSE needs to be implemented). 6. Continues to update the topology information of other nodes in the cluster. #### Advantages 1. Operationally friendly: The controller is auditable and versioned, making it easier to iterate. 2. Low risk: Changes to kvrocks are small and testable. #### Disadvantages/Tradeoffs 1. The controller and kvrocks require multiple rounds of interaction, requiring handling of multiple exceptions (retries and rollbacks). 2. Slightly higher failover latency (controller coordination); this is acceptable in most environments. 3. Controllers must be highly reliable – becoming critical infrastructure (but easier to monitor and maintain than core logic). GitHub link: https://github.com/apache/kvrocks/discussions/3218 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
