On Wed, May 18, 2016 at 02:53:00PM +0800, Eric Ren wrote:
> Q1: what's stateful merged node?

> Q2: what if we add the stateful merged nodes to dlm_controld daemon
> cpg instead of fencing them?

The details here are fundamental to the way dlm works because the dlm
depends on the properties of Virtual Synchrony.  Partitions obviously
violate VS.  ("Extended" forms of virtual synchrony deal with partitions,
but they are not very practical.  Unfortunately, corosync implements one
of these extended forms of VS, which means any application that requires
strict VS has to implement an equivalent of this "stateful merging"
detection that's in the dlm.)

With VS, message/membership events change the state being kept consistent
among nodes.  When a partition occurs, nodes have divergent events and
inconsistent state.  The partition is simple to understand, because
partitioned nodes are indistinguishable from failed nodes and are treated
as such.  But, if partitioned nodes merge, the inconsistent state has to
be made consistent.  This must be done in the same way a new node is added
to an existing node, which means doing "state transfer" from the existing
node to the new node to make the state consistent between them.

If the "new" node previously had state because of partition/merge, it must
drop that old state and replace it with the state being transferred to it.
After this, they will be consistent and can continue.  With a simple
process, you might just kill it, restart it and add the transferred state.
But the dlm isn't a process that can simply be restarted, the state is
spread through applications using it, and through the kernel.  The only
mechanism for resetting the dlm state is resetting the kernel, which is
resetting/rebooting the machine.

> if so, CPG $uuid now, e.g. from the perspective of A, may has only one 
> memeber - A itself, it can perform lockspace now because cluster is 
> quorate now (and if we skip fencing); B and C do likewise; then for each 
> node, it looks like every node own this volume; so corruption may happen?

When the nodes are partitioned, the situation is fairly straight forward
-- each node thinks the others are failed, and normal operation is blocked
until recovery happens for the failed nodes.

The harder problem is what to do when they merge.  The dlm effectively
ignores the invalid addition of the merged nodes and calls it a "stateful
merge".  The merged nodes continue to be considered failed (from the
partition) and require a full restart before being added.

Reply via email to