Re: [Linux-HA] standby does not take over on multiple power failure

Andrew Beekhof Mon, 04 Jun 2007 04:49:30 -0700

On 6/4/07, Thomas Åkerblom (HF/EBC) <[EMAIL PROTECTED]> wrote:

Hi Andrew.
I'm using 2.0.8-0.15, but I have seen the same behavior in 2.0.7.
In this case ha-9 is DC and also the standby server.
ha-8 has no power, but the standby server does not take over.
The logs begin right before I pulled the power cord.


Actually I do know how to get around this problem now, but I also have some new 
questions.
If I remove the line:
<nvpair id="default_resource_failure_stickiness" name="default_resource_failure_stickiness" 
value="-INFINITY"/>
In the cib file the problem disappears.
I wouldn't expect that parameter to have this effect, rather the opposite.
Is this a known/expected correlation?


not so much "correlation" as "thats what its designed to do".

setting default_resource_failure_stickiness=-INFINITY means that if
heartbeat finds the rscX as failed on nodeY, then never ever consider
nodeY as a valid place to run rscX ever again... at least not until
the admin "clears" the error by resetting the failcount.

in the future we'll expire the failures after "a period of time" but
that is not yet implemented as the lrm doesn't provide the infomation
to do so.

I would like to set that parameter in order to be able to use the failure 
counters.
Furthermore I am not able to read and reset the counters using:

crm_failcount -G -U ha-8 -r rsc_lim8
        The result is always 0

crm_failcount -D -U ha-8 -r rsc_lim8
        Error performing operation: The object/attribute does not exist.


later versions return 0 instead of "The object/attribute does not exist."

updated packages for most distros/platforms are available at:
  http://software.opensuse.org/download/server:/ha-clustering/
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] standby does not take over on multiple power failure

Reply via email to