> I observe strange problems with fencing when a cluster loose quorum for a
> short time.
> 
> After regain quorum, fenced reports 'wait state   messages', and whole
> cluster is blocked waiting for fenced.

Just found the following in fenced/cpg.c:

                /* This is how we deal with cpg's that are partitioned and
                   then merge back together.  When the merge happens, the
                   cpg on each side will see nodes from the other side being
                   added, and neither side will have zero started_count.  So,
                   both sides will ignore start messages from the other side.
                   This causes the the domain on each side to continue waiting
                   for the missing start messages indefinately.  To unblock
                   things, all nodes from one side of the former partition
                   need to fail. */

So the observed behavior is expected? 




Reply via email to