03.09.2013 21:45, Digimer wrote:
> On 03/09/13 14:14, Vladislav Bogdanov wrote:
>> 03.09.2013 07:04, Digimer wrote:
>> ...
>>> To solve problem 1, you can set a delay against one of the nodes. Say
>>> you set the fence primitive for node 01 to have 'delay="15"'. When node
>>> 1 goes to fence node 2, it starts immediately. When node 2 starts to
>>> fence node 1, it sees the 15 second delay and pauses. Node 1 will power
>>> off node 2 long before node 2 finishes the pause. You can further help
>>> this problem by disabling acpid on the nodes. Without it, the power-off
>>> signal from the BMC will be nearly instant, shortening up the window
>>> where both nodes can initiate a fence.
>>
>> Does anybody know for sure how and *why* does it work? I mean why
>> disabling userspace ACPI event reader (which reads just what kernel
>> sends after hardware events) affects how hardware behaves?
> 
> Disabling acpid causes, in my experience, the node to instantly power
> down when it receives a power-button event. How/why this happens is
> probably buried in the kernel source and/or ACPI definitions.

This assumes some kind of back-events, which are not the part of ACPI
iirc. And kernel just translates "forward" ACPI events (bits in hw
port???) to userspace.

Interesting enough, how do they do it...

> 
>>> To solve problem 2, simply disable corosync/pacemaker from starting on
>>> boot. This way, the fenced node will be (hopefully) back up and running,
>>> so you can ssh into it and look at what happened. It won't try to rejoin
>>> the cluster though, so no risk of a fence loop.
>>
>> Enhancement to this would be enabling corosync/pacemaker back during the
>> clean shutdown and disabling it after boot.
> 
> That would be a good idea, actually. I like that.
> 

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to