Kaustubh1204 commented on issue #268:
URL: 
https://github.com/apache/kvrocks-controller/issues/268#issuecomment-3892268749

   Hi @RiversJin and @git-hulk,
   I would like to take ownership of this issue.
   From my understanding, the core problem lies in the non-atomic update 
sequence between Kvrocks (data plane) and etcd (control plane). If SetSlot 
succeeds but UpdateCluster fails, the version in Kvrocks can advance beyond the 
version stored in etcd, leading to a divergence that may block subsequent 
controller operations. This creates a potential deadlock scenario where future 
version increments are rejected because they are no longer strictly greater 
than the already-applied version.
   Reversing the update order (etcd first, then Kvrocks) can mitigate this 
specific failure mode, but I believe we should also clearly define version 
authority and consider reconciliation logic to handle cases where Kvrocks is 
already ahead. Ensuring idempotency and convergence under partial failures will 
be critical.
   My plan is to:
   Trace the full slot migration flow and version bump logic.
   Identify all failure points between SetSlot and UpdateCluster.
   Propose a safe update sequence with proper CAS/version checks.
   Evaluate whether reconciliation logic is needed when Kvrocks reports a 
higher version.
   I’ll share a more concrete design proposal before implementing changes to 
ensure alignment.
   Please let me know if you have any specific constraints or considerations I 
should keep in mind.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to