When there are 3 or more partitions that merge, none may see enough
clean nodes. Therefore, DLM would be stuck there forever unitl administrator
manually reset/restart enough nodes to produce sufficient clean nodes.
However, sometimes people hope that DLM can automatically recover from "useless"
state by forcing kick statefull merged nodes.

The option of "enable_force_kick" defaults to "0"(disabled), which
remains the old way. Note that, enable this option at your own risk
because it's hard to predict which node (or none) will survive when both
sides of the merged partitions are kicking the other out of the cluster
at the same time.

Signed-off-by: Eric Ren <z...@suse.com>
---
 dlm_controld/daemon_cpg.c   | 6 +++++-
 dlm_controld/dlm.conf.5     | 2 ++
 dlm_controld/dlm_controld.8 | 5 +++++
 dlm_controld/dlm_daemon.h   | 1 +
 dlm_controld/main.c         | 6 ++++++
 5 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/dlm_controld/daemon_cpg.c b/dlm_controld/daemon_cpg.c
index 356e80d..a09971d 100644
--- a/dlm_controld/daemon_cpg.c
+++ b/dlm_controld/daemon_cpg.c
@@ -845,7 +845,11 @@ static void daemon_fence_work(void)
                log_retry(retry_fencing, "fence work wait to clear merge %d 
clean %d part %d gone %d",
                          merge_count, clean_count, part_count, gone_count);
 
-               if ((clean_count >= merge_count) && !part_count && (low == 
our_nodeid))
+               if(opt(enable_force_kick_ind))
+                       log_retry(retry_fencing, "fence work force to kick 
stateful merged members");
+
+               if ((clean_count >= merge_count || opt(enable_force_kick_ind))
+                   && !part_count && (low == our_nodeid))
                        kick_stateful_merge_members();
 
                retry = 1;
diff --git a/dlm_controld/dlm.conf.5 b/dlm_controld/dlm.conf.5
index 007e4de..4dc1ba4 100644
--- a/dlm_controld/dlm.conf.5
+++ b/dlm_controld/dlm.conf.5
@@ -68,6 +68,8 @@ enable_quorum_fencing
 .br
 enable_quorum_lockspace
 .br
+enable_force_kick
+.br
 
 .SH Fencing
 
diff --git a/dlm_controld/dlm_controld.8 b/dlm_controld/dlm_controld.8
index c9011fd..c424f41 100644
--- a/dlm_controld/dlm_controld.8
+++ b/dlm_controld/dlm_controld.8
@@ -87,6 +87,11 @@ For default settings, see dlm_controld -h.
 0|1
         enable/disable quorum requirement for lockspace operations
 
+.B --enable_force_kick | -k
+0|1
+        enable/disable forcing kick when cluster is stuck waiting
+        for administrator to manually produce enough clean nodes
+
 .B --fence_all
 .I str
         fence all nodes with this agent
diff --git a/dlm_controld/dlm_daemon.h b/dlm_controld/dlm_daemon.h
index 62508ea..bdaf6bc 100644
--- a/dlm_controld/dlm_daemon.h
+++ b/dlm_controld/dlm_daemon.h
@@ -108,6 +108,7 @@ enum {
         enable_startup_fencing_ind,
         enable_quorum_fencing_ind,
         enable_quorum_lockspace_ind,
+       enable_force_kick_ind,
         help_ind,
         version_ind,
         dlm_options_max,
diff --git a/dlm_controld/main.c b/dlm_controld/main.c
index 4f1399f..354db44 100644
--- a/dlm_controld/main.c
+++ b/dlm_controld/main.c
@@ -1355,6 +1355,12 @@ static void set_opt_defaults(void)
                        1, NULL,
                        "enable/disable quorum requirement for lockspace 
operations");
 
+       set_opt_default(enable_force_kick_ind,
+                       "enable_force_kick", 'k', req_arg_bool,
+                       0, NULL,
+                       "enable/disable forcing kick when cluster is stuck 
waiting "
+                       "for administrator to manually produce enough clean 
nodes");
+
        set_opt_default(help_ind,
                        "help", 'h', no_arg,
                        -1, NULL,
-- 
2.6.6

Reply via email to