On Wed, Dec 1, 2010 at 10:08 AM, Andrew Miklas <[email protected]> wrote:
> Hi,
>
> I'm curious how the "on-fail" attribute of a recurring monitor
> operation works.  From my testing, it seems that a recurring monitor
> is considered to have failed any time its return doesn't match what
> the cluster believes it should be.  That is, if the resource is
> supposed to be running, and the monitor returns with anything other
> than OCF_SUCCESS, then the on-fail action will be taken.  Is my
> understanding of this correct?

yes

>
> If so, is it possible to have different fail actions depending on the
> sort of failure?

Not currently, no.
I can imagine this would be useful in some situations though.

> Specifically, I'm looking to do a on-fail="restart"
> if the RA comes back with OCF_NOT_RUNNING when Pacemaker believes that
> the resource should be running.  However, I'd like on-fail="ignore" if
> the recurring monitor operation comes back with OCF_ERR_GENERIC, or
> times out.

Ignore?  Really?
Easiest way to achieve that is a) set your timeouts insanely high and
b) change the RA to mask out OCF_ERR_GENERIC return values.

>
> To explain -- I'm working with a resource that is adjusted by RPCs
> (still those blasted AWS elastic IPs).  On occasion, the external API
> may fail, which will be surfaced to Pacemaker as a OCF_ERR_GENERIC or
> a timeout.  A transient API failure isn't itself a cause for alarm --
> the cluster should simply assume that it has the correct view of the
> universe until the API becomes available again.  However, if the
> external API indicates with certainty that the resource is down when
> Pacemaker believes it should be up, then we should take corrective
> action immediately.
>
>
> Thanks,
>
>
> Andrew
>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to