Re: [Linux-HA] failcount set to INFINITY (1000000) if monitor returns rc=7

Andrew Beekhof Mon, 21 Sep 2009 03:14:38 -0700

On Mon, Sep 21, 2009 at 11:39 AM, Dejan Muhamedagic <[email protected]> wrote:
> Hi,
>
> On Mon, Sep 21, 2009 at 11:15:51AM +0200, Andrew Beekhof wrote:
>> On Fri, Sep 18, 2009 at 12:52 PM, Enno Gröper
>> <[email protected]> wrote:
>> > Hi,
>> > I'm using pacemaker with heartbeat to run a 2 node dhcp server cluster
>> > with shared disk using drbd for the lease file.
>> > After upgrading from using heartbeat 2.1.3 (lenny packages) alone (I
>> > purged the old install and removed rest of the old files by hand) I have
>> > some strange problems.
>> > When stopping the monitored dhcp service using "/etc/init.d/dhcp3-server
>> > stop" pacemaker recognises this as expected, but instead of simply
>> > trying to restart the resource on the same node it leaves it stopped
>> > (the other node is in standby mode).
>> > To achieve what I want (and what I think was default behaviour using
>> > heartbeat 2.1.3) I set migration_threshold to 1.
>> > However failcount is set to INFINITY instead of being increased by 1 so
>> > this doesn't matter.
>> > I thougt failcount is only set to INFINITY if failures occur on starting
>> > a resource?
>>
>> With migration-threshold = 1, _any_ failure will force the resource to
>> another node.
>> Including monitor failures.
>
> And if the other node is in standby then the resource remains
> down. I still find that counterintuitive.


I don't see why.
I get that it might not be what you want, but its a logical consequence of
  If the resource fails N times on nodeX it cant run on nodeX

> To put it differently:
> How to configure pacemaker to always do a failover to another
> node, but to restart the resource in case other nodes are not
> available.

if a small delay is acceptable, then you can use failure-timeout.

But seriously, if the existing node could still host the resource
after a single failure, then why force it to move under any condition?
What benefit do you get from this?
Basically I'd suggest "1" is the wrong value for migration-threshold
in this case.  Set it to 2 to see if a restart helps and if not _then_
force it off (if the other node is down, subsequent restarts are
unlikely to be helpful in the immediate term).
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] failcount set to INFINITY (1000000) if monitor returns rc=7

Reply via email to