>>> Andrew Beekhof <[email protected]> schrieb am 11.04.2013 um 01:05 in >>> Nachricht <[email protected]>:
> On 10/04/2013, at 11:54 PM, Ulrich Windl <[email protected]> > wrote: > > > Hi! > > > > I had a situation when one node was periodically fenced when there was a > busy network. The node bing fenced tried to restart crmd after some problem, > and shortly after rejoining the cluster, it was fenced. > > > The message "Apr 5 14:14:14 h01 crmd: [13080]: ERROR: do_recover: Action > A_RECOVER (0000000001000000) not supported" is normal but should really be > changed as it is misleading. > > The "real" error is above it: > > > Apr 5 14:14:14 h01 crmd: [13080]: ERROR: tengine_stonith_notify: We were > alegedly just fenced by h05 for h05! > > The rest is pacemaker saying "holly heck" and trying to get out of there > asap. > What agent are you using for fencing? Doesn't sound very reliable. > [...] We use sbd, and it works very reliable; in fact it fences the nodes more often than we like ;-) I guess some issue with cLVM and mirroring that floods the network and slows down the machine. Then some bugs in pacemaker seem to surface ;-) One example (note the delicate ordering of tokens ;-): crmd: [9801]: info: delete_resource: Removing resource prm_xen_v02 for 20705_crm_resource (root) on h01 crmd: [9801]: WARN: decode_transition_key: Bad UUID (crm-resource-20705) in sscanf result (3) for 0:0:crm-resource-20705 Regards, Ulrich _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
