(kvrocks) branch unstable updated: fix(replication): potential deadlock when switching master frequently (#2516)

hulk Mon, 02 Sep 2024 00:10:12 -0700

This is an automated email from the ASF dual-hosted git repository.

hulk pushed a commit to branch unstable
in repository https://gitbox.apache.org/repos/asf/kvrocks.git



The following commit(s) were added to refs/heads/unstable by this push:
     new ab41cbb6 fix(replication): potential deadlock  when switching master 
frequently (#2516)
ab41cbb6 is described below

commit ab41cbb69d081711437e25846c86c920cb5253e1
Author: hulk <[email protected]>
AuthorDate: Mon Sep 2 15:10:02 2024 +0800

    fix(replication): potential deadlock  when switching master frequently 
(#2516)
    
    This closes #2512.
    
    Currently, the replication thread will wait for the worker's exclusive 
guard stop before closing db.
    But it now stops the worker from running new commands after acquiring the 
worker's exclusive guard,
    and it might cause deadlock if switches at the same time.
    
    The following steps will show how it may happen:
    
    - T0: client A sent `slaveof MASTER_IP0 MASTER_PORT0`, then the replication 
thread was started and waiting for the exclusive guard.
    
    - T1: client B sent `slaveof MASTER_IP1 MASTER_PORT1` and `AddMaster` will 
stop the previous replication thread, which is waiting for the exclusive guard. 
But the exclusive guard is acquired by the current thread.
    
    The workaround is also straightforward, just stop workers from running new 
commands by enabling `is_loading_` to
    true before acquiring the lock in the replication thread.
---
 src/server/server.cc | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/src/server/server.cc b/src/server/server.cc
index 3381ffbc..393a70d8 100644
--- a/src/server/server.cc
+++ b/src/server/server.cc
@@ -1353,14 +1353,16 @@ void Server::PrepareRestoreDB() {
   // accessing, data migration task should be stopped before restoring DB
   WaitNoMigrateProcessing();
 
+  // Workers will disallow to run commands which may access DB, so we should
+  // enable this flag to stop workers from running new commands. And wait for
+  // the exclusive guard to be released to guarantee no worker is running.
+  is_loading_ = true;
+
   // To guarantee work threads don't access DB, we should release 
'ExclusivityGuard'
   // ASAP to avoid user can't receive responses for long time, because the 
following
   // 'CloseDB' may cost much time to acquire DB mutex.
   LOG(INFO) << "[server] Waiting workers for finishing executing commands...";
-  {
-    auto exclusivity = WorkExclusivityGuard();
-    is_loading_ = true;
-  }
+  { auto exclusivity = WorkExclusivityGuard(); }
 
   // Cron thread, compaction checker thread, full synchronization thread
   // may always run in the background, we need to close db, so they don't 
actually work.

(kvrocks) branch unstable updated: fix(replication): potential deadlock when switching master frequently (#2516)

Reply via email to