>>> Andrew Beekhof <[email protected]> schrieb am 11.04.2013 um 01:05 in 
>>> Nachricht
<[email protected]>:

> On 10/04/2013, at 11:54 PM, Ulrich Windl <[email protected]> 
> wrote:
> 
> > Hi!
> > 
> > I had a situation when one node was periodically fenced when there was a 
> busy network. The node bing fenced tried to restart crmd after some problem, 
> and shortly after rejoining the cluster, it was fenced.
> 
> 
> The message "Apr  5 14:14:14 h01 crmd: [13080]: ERROR: do_recover: Action 
> A_RECOVER (0000000001000000) not supported" is normal but should really be 
> changed as it is misleading.
> 
> The "real" error is above it:
> 
> > Apr  5 14:14:14 h01 crmd: [13080]: ERROR: tengine_stonith_notify: We were 
> alegedly just fenced by h05 for h05!
> 
> The rest is pacemaker saying "holly heck" and trying to get out of there 
> asap.
> What agent are you using for fencing?  Doesn't sound very reliable.
> 
[...]

We use sbd, and it works very reliable; in fact it fences the nodes more often 
than we like ;-)

I guess some issue with cLVM and mirroring that floods the network and slows 
down the machine. Then some bugs in pacemaker seem to surface ;-)

One example (note the delicate ordering of tokens ;-):
crmd: [9801]: info: delete_resource: Removing resource prm_xen_v02 for 
20705_crm_resource (root) on h01
crmd: [9801]: WARN: decode_transition_key: Bad UUID (crm-resource-20705) in 
sscanf result (3) for 0:0:crm-resource-20705

Regards,
Ulrich


_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to