On Wed, May 18, 2016 at 02:53:00PM +0800, Eric Ren wrote: > Q1: what's stateful merged node?
> Q2: what if we add the stateful merged nodes to dlm_controld daemon > cpg instead of fencing them? The details here are fundamental to the way dlm works because the dlm depends on the properties of Virtual Synchrony. Partitions obviously violate VS. ("Extended" forms of virtual synchrony deal with partitions, but they are not very practical. Unfortunately, corosync implements one of these extended forms of VS, which means any application that requires strict VS has to implement an equivalent of this "stateful merging" detection that's in the dlm.) With VS, message/membership events change the state being kept consistent among nodes. When a partition occurs, nodes have divergent events and inconsistent state. The partition is simple to understand, because partitioned nodes are indistinguishable from failed nodes and are treated as such. But, if partitioned nodes merge, the inconsistent state has to be made consistent. This must be done in the same way a new node is added to an existing node, which means doing "state transfer" from the existing node to the new node to make the state consistent between them. If the "new" node previously had state because of partition/merge, it must drop that old state and replace it with the state being transferred to it. After this, they will be consistent and can continue. With a simple process, you might just kill it, restart it and add the transferred state. But the dlm isn't a process that can simply be restarted, the state is spread through applications using it, and through the kernel. The only mechanism for resetting the dlm state is resetting the kernel, which is resetting/rebooting the machine. > if so, CPG $uuid now, e.g. from the perspective of A, may has only one > memeber - A itself, it can perform lockspace now because cluster is > quorate now (and if we skip fencing); B and C do likewise; then for each > node, it looks like every node own this volume; so corruption may happen? When the nodes are partitioned, the situation is fairly straight forward -- each node thinks the others are failed, and normal operation is blocked until recovery happens for the failed nodes. The harder problem is what to do when they merge. The dlm effectively ignores the invalid addition of the merged nodes and calls it a "stateful merge". The merged nodes continue to be considered failed (from the partition) and require a full restart before being added.