03.09.2013 07:04, Digimer wrote:
...
> To solve problem 1, you can set a delay against one of the nodes. Say
> you set the fence primitive for node 01 to have 'delay="15"'. When node
> 1 goes to fence node 2, it starts immediately. When node 2 starts to
> fence node 1, it sees the 15 second delay and pauses. Node 1 will power
> off node 2 long before node 2 finishes the pause. You can further help
> this problem by disabling acpid on the nodes. Without it, the power-off
> signal from the BMC will be nearly instant, shortening up the window
> where both nodes can initiate a fence.

Does anybody know for sure how and *why* does it work? I mean why
disabling userspace ACPI event reader (which reads just what kernel
sends after hardware events) affects how hardware behaves?

> 
> To solve problem 2, simply disable corosync/pacemaker from starting on
> boot. This way, the fenced node will be (hopefully) back up and running,
> so you can ssh into it and look at what happened. It won't try to rejoin
> the cluster though, so no risk of a fence loop.

Enhancement to this would be enabling corosync/pacemaker back during the
clean shutdown and disabling it after boot.

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to