03.09.2013 21:45, Digimer wrote: > On 03/09/13 14:14, Vladislav Bogdanov wrote: >> 03.09.2013 07:04, Digimer wrote: >> ... >>> To solve problem 1, you can set a delay against one of the nodes. Say >>> you set the fence primitive for node 01 to have 'delay="15"'. When node >>> 1 goes to fence node 2, it starts immediately. When node 2 starts to >>> fence node 1, it sees the 15 second delay and pauses. Node 1 will power >>> off node 2 long before node 2 finishes the pause. You can further help >>> this problem by disabling acpid on the nodes. Without it, the power-off >>> signal from the BMC will be nearly instant, shortening up the window >>> where both nodes can initiate a fence. >> >> Does anybody know for sure how and *why* does it work? I mean why >> disabling userspace ACPI event reader (which reads just what kernel >> sends after hardware events) affects how hardware behaves? > > Disabling acpid causes, in my experience, the node to instantly power > down when it receives a power-button event. How/why this happens is > probably buried in the kernel source and/or ACPI definitions.
This assumes some kind of back-events, which are not the part of ACPI iirc. And kernel just translates "forward" ACPI events (bits in hw port???) to userspace. Interesting enough, how do they do it... > >>> To solve problem 2, simply disable corosync/pacemaker from starting on >>> boot. This way, the fenced node will be (hopefully) back up and running, >>> so you can ssh into it and look at what happened. It won't try to rejoin >>> the cluster though, so no risk of a fence loop. >> >> Enhancement to this would be enabling corosync/pacemaker back during the >> clean shutdown and disabling it after boot. > > That would be a good idea, actually. I like that. > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
