On Mon, Sep 21, 2009 at 11:39 AM, Dejan Muhamedagic <[email protected]> wrote: > Hi, > > On Mon, Sep 21, 2009 at 11:15:51AM +0200, Andrew Beekhof wrote: >> On Fri, Sep 18, 2009 at 12:52 PM, Enno Gröper >> <[email protected]> wrote: >> > Hi, >> > I'm using pacemaker with heartbeat to run a 2 node dhcp server cluster >> > with shared disk using drbd for the lease file. >> > After upgrading from using heartbeat 2.1.3 (lenny packages) alone (I >> > purged the old install and removed rest of the old files by hand) I have >> > some strange problems. >> > When stopping the monitored dhcp service using "/etc/init.d/dhcp3-server >> > stop" pacemaker recognises this as expected, but instead of simply >> > trying to restart the resource on the same node it leaves it stopped >> > (the other node is in standby mode). >> > To achieve what I want (and what I think was default behaviour using >> > heartbeat 2.1.3) I set migration_threshold to 1. >> > However failcount is set to INFINITY instead of being increased by 1 so >> > this doesn't matter. >> > I thougt failcount is only set to INFINITY if failures occur on starting >> > a resource? >> >> With migration-threshold = 1, _any_ failure will force the resource to >> another node. >> Including monitor failures. > > And if the other node is in standby then the resource remains > down. I still find that counterintuitive.
I don't see why. I get that it might not be what you want, but its a logical consequence of If the resource fails N times on nodeX it cant run on nodeX > To put it differently: > How to configure pacemaker to always do a failover to another > node, but to restart the resource in case other nodes are not > available. if a small delay is acceptable, then you can use failure-timeout. But seriously, if the existing node could still host the resource after a single failure, then why force it to move under any condition? What benefit do you get from this? Basically I'd suggest "1" is the wrong value for migration-threshold in this case. Set it to 2 to see if a restart helps and if not _then_ force it off (if the other node is down, subsequent restarts are unlikely to be helpful in the immediate term). _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
