On Thu, May 12, 2016 at 05:16:08PM +0800, Eric Ren wrote:
> DLM would be stuck in "need fencing" state, although cluster can
> regain quorum very quickly after a network transient disconnection.
> 
> It's possible that this process happens within one monoclock. It
> means "cluster_quorate_monotime" can eqaul "node->daemon_rem_time".
> We now skip this chance of telling corosync to kill cluster for
> stateful merge. As a result, any fencing cannot proceed further.

Hi Eric, thanks for looking at this, it's a notoriously difficult
situation to sort out.  I'm not sure we have the same understanding of how
the behavior will change with your patch, so let's look at an example, and
please let me know if you think these examples don't match what you see
(it's been quite a while since I actually tested this).

T = time in seconds, A,B,C = cluster nodes.

At T=1 A,B,C become members and have quorum.
At T=10 a partition creates A,B | C.
At T=11 it merges and creates A,B,C.

At T=12, A,B will have:
cluster_quorate=1
cluster_quorate_monotime=1
C->daemon_rem_time=10

At T=12, C will have:
cluster_quorate=1
cluster_quorate_monotime=11
A->daemon_rem_time=10
B->daemon_rem_time=10

Result:

A,B will kick C from the cluster because
cluster_quorate_monotime (1) < C->daemon_rem_time (10),
which is what we want.

C will not kick A,B from the cluster because
cluster_quorate_monotime (11) > A->daemon_rem_time (10),
which is what we want.

It's the simpler case, but does that sound right so far?

...

If the partition and merge occur within the same second, then:

At T=1 A,B,C become members and get quorum.
At T=10 a partition creates A,B | C.
At T=10 it merges and creates A,B,C.

At T=12, A,B will have:
cluster_quorate=1
cluster_quorate_monotime=1
C->daemon_rem_time=10

At T=12, C will have:
cluster_quorate=1
cluster_quorate_monotime=10
A->daemon_rem_time=10
B->daemon_rem_time=10

Result:

A,B will kick C from the cluster because
cluster_quorate_monotime (1) < C->daemon_rem_time (10),
which is what we want.

C will not kick A,B from the cluster because
cluster_quorate_monotime (10) = A->daemon_rem_time (10),
which is what we want.

If that's correct, there doesn't seem to be problem so far.
If we apply your patch, won't it allow C to kick A,B from the
cluster since cluster_quorate_monotime = A->daemon_rem_time?

...

If you're looking at a cluster with an equal partition, e.g. A,B | C,D,
then it becomes messy because cluster_quorate_monotime = daemon_rem_time
everywhere after the merge.  In this case, no nodes will kick others from
the cluster, but with your patch, each side will kick the other side from
the cluster.  Neither option is good.  In the past we decided to let the
cluster sit in this state so an admin could choose which nodes to remove.
Do you prefer the alternative of kicking nodes in this case (with somewhat
unpredictable results)?  If so, we could make that an optional setting,
but we'd want to keep the existing behavior for non-even partitions in the
example above.


> diff --git a/dlm_controld/daemon_cpg.c b/dlm_controld/daemon_cpg.c
> index 356e80d..cd8a4e2 100644
> --- a/dlm_controld/daemon_cpg.c
> +++ b/dlm_controld/daemon_cpg.c
> @@ -1695,7 +1695,7 @@ static void receive_protocol(struct dlm_header *hd, int 
> len)
>               node->stateful_merge = 1;
>  
>               if (cluster_quorate && node->daemon_rem_time &&
> -                 cluster_quorate_monotime < node->daemon_rem_time) {
> +                 cluster_quorate_monotime <= node->daemon_rem_time) {
>                       if (!node->killed) {
>                               if (cluster_two_node) {
>                                       /*
> -- 
> 2.6.6

Reply via email to