On Wed, Dec 1, 2010 at 10:08 AM, Andrew Miklas <[email protected]> wrote: > Hi, > > I'm curious how the "on-fail" attribute of a recurring monitor > operation works. From my testing, it seems that a recurring monitor > is considered to have failed any time its return doesn't match what > the cluster believes it should be. That is, if the resource is > supposed to be running, and the monitor returns with anything other > than OCF_SUCCESS, then the on-fail action will be taken. Is my > understanding of this correct?
yes > > If so, is it possible to have different fail actions depending on the > sort of failure? Not currently, no. I can imagine this would be useful in some situations though. > Specifically, I'm looking to do a on-fail="restart" > if the RA comes back with OCF_NOT_RUNNING when Pacemaker believes that > the resource should be running. However, I'd like on-fail="ignore" if > the recurring monitor operation comes back with OCF_ERR_GENERIC, or > times out. Ignore? Really? Easiest way to achieve that is a) set your timeouts insanely high and b) change the RA to mask out OCF_ERR_GENERIC return values. > > To explain -- I'm working with a resource that is adjusted by RPCs > (still those blasted AWS elastic IPs). On occasion, the external API > may fail, which will be surfaced to Pacemaker as a OCF_ERR_GENERIC or > a timeout. A transient API failure isn't itself a cause for alarm -- > the cluster should simply assume that it has the correct view of the > universe until the API becomes available again. However, if the > external API indicates with certainty that the resource is down when > Pacemaker believes it should be up, then we should take corrective > action immediately. > > > Thanks, > > > Andrew > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
