When there are 3 or more partitions that merge, none may see enough clean nodes. Therefore, DLM would be stuck there forever unitl administrator manually reset/restart enough nodes to produce sufficient clean nodes. However, sometimes people hope that DLM can automatically recover from "useless" state by forcing kick statefull merged nodes.
The option of "enable_force_kick" defaults to "0"(disabled), which remains the old way. Note that, enable this option at your own risk because it's hard to predict which node (or none) will survive when both sides of the merged partitions are kicking the other out of the cluster at the same time. Signed-off-by: Eric Ren <z...@suse.com> --- dlm_controld/daemon_cpg.c | 6 +++++- dlm_controld/dlm.conf.5 | 2 ++ dlm_controld/dlm_controld.8 | 5 +++++ dlm_controld/dlm_daemon.h | 1 + dlm_controld/main.c | 6 ++++++ 5 files changed, 19 insertions(+), 1 deletion(-) diff --git a/dlm_controld/daemon_cpg.c b/dlm_controld/daemon_cpg.c index 356e80d..a09971d 100644 --- a/dlm_controld/daemon_cpg.c +++ b/dlm_controld/daemon_cpg.c @@ -845,7 +845,11 @@ static void daemon_fence_work(void) log_retry(retry_fencing, "fence work wait to clear merge %d clean %d part %d gone %d", merge_count, clean_count, part_count, gone_count); - if ((clean_count >= merge_count) && !part_count && (low == our_nodeid)) + if(opt(enable_force_kick_ind)) + log_retry(retry_fencing, "fence work force to kick stateful merged members"); + + if ((clean_count >= merge_count || opt(enable_force_kick_ind)) + && !part_count && (low == our_nodeid)) kick_stateful_merge_members(); retry = 1; diff --git a/dlm_controld/dlm.conf.5 b/dlm_controld/dlm.conf.5 index 007e4de..4dc1ba4 100644 --- a/dlm_controld/dlm.conf.5 +++ b/dlm_controld/dlm.conf.5 @@ -68,6 +68,8 @@ enable_quorum_fencing .br enable_quorum_lockspace .br +enable_force_kick +.br .SH Fencing diff --git a/dlm_controld/dlm_controld.8 b/dlm_controld/dlm_controld.8 index c9011fd..c424f41 100644 --- a/dlm_controld/dlm_controld.8 +++ b/dlm_controld/dlm_controld.8 @@ -87,6 +87,11 @@ For default settings, see dlm_controld -h. 0|1 enable/disable quorum requirement for lockspace operations +.B --enable_force_kick | -k +0|1 + enable/disable forcing kick when cluster is stuck waiting + for administrator to manually produce enough clean nodes + .B --fence_all .I str fence all nodes with this agent diff --git a/dlm_controld/dlm_daemon.h b/dlm_controld/dlm_daemon.h index 62508ea..bdaf6bc 100644 --- a/dlm_controld/dlm_daemon.h +++ b/dlm_controld/dlm_daemon.h @@ -108,6 +108,7 @@ enum { enable_startup_fencing_ind, enable_quorum_fencing_ind, enable_quorum_lockspace_ind, + enable_force_kick_ind, help_ind, version_ind, dlm_options_max, diff --git a/dlm_controld/main.c b/dlm_controld/main.c index 4f1399f..354db44 100644 --- a/dlm_controld/main.c +++ b/dlm_controld/main.c @@ -1355,6 +1355,12 @@ static void set_opt_defaults(void) 1, NULL, "enable/disable quorum requirement for lockspace operations"); + set_opt_default(enable_force_kick_ind, + "enable_force_kick", 'k', req_arg_bool, + 0, NULL, + "enable/disable forcing kick when cluster is stuck waiting " + "for administrator to manually produce enough clean nodes"); + set_opt_default(help_ind, "help", 'h', no_arg, -1, NULL, -- 2.6.6