On Wed, Oct 03, 2012 at 04:12:10PM +0000, Dietmar Maurer wrote: > > Yes, it's a stateful partition merge, and I think /var/log/messages should > > have > > mentioned something about that. When a node is partitioned from the > > others (e.g. network disconnected), it has to be cleanly reset before it's > > allowed back. "cleanly reset" typically means rebooted. If it comes back > > without being reset (e.g. network reconnected), then the others ignore it, > > which is what you saw.
> What message should I look for? I was wrong, I was thinking about the "daemon node %d stateful merge" messages which are debug, but should probably be changed to error. > I don't really understand why 'dlm_controld' initiates fencing, although > the node does not has quorum? > > I thought 'dlm_controld' should wait until cluster is quorate before > starting fence actions? I guess you're talking about the dlm_tool ls output? The "fencing" there means it is waiting for fenced to finish fencing before it starts dlm recovery. fenced waits for quorum. hp2:~# dlm_tool ls dlm lockspaces name rgmanager id 0x5231f3eb flags 0x00000004 kern_stop change member 3 joined 1 remove 0 failed 0 seq 2,2 members 2 3 4 new change member 2 joined 0 remove 1 failed 1 seq 3,3 new status wait_messages 0 wait_condition 1 fencing new members 3 4