Re: [Linux-HA] Re: Re: Heartbeat and RA monitor functions

Andrew Beekhof Wed, 28 May 2008 22:48:02 -0700

On Wed, May 28, 2008 at 5:26 PM, Joe Bill <[EMAIL PROTECTED]> wrote:
>
>> It is also advisable that it accurately report
>> the service's true state after a start operation
>> and mandatory for a stop.
>> This is easily done by calling a level 0 check
>> in a loop at the end of both functions.
>
> Sure, but HB's business being calling RAs to perform
> all those STARTs and STOPs and MONITORs, *** for the
> sake of consistency,*** I would have thought it to be
> HB's job to call those level 0 checks after a resource
> state changing operation.


Short answer: no
Long answer: This isn't about consistency - its about the RA do
whatever it needs to in order to return accurate results.
It just so happens that the easiest way to achieve that is to call a
function it already implements.

We also require the RA to wait until the resource has finished
stopped/starting - otherwise the cluster has the impossible task of
trying to figure out if the resource is broken or just in the middle
of changing state.

So there are a number of reasons why its a good idea for RAs to do this.

>
>> Calling more intensive checks is up to the RA
>> writer and in the case of stops, depends on the
>> chances of the level 0 check being incorrect.
>
> Precisely my own thought.
>
> My understanding is that MONITOR may return only the
> following status, (described in OCF RA API section
> 3.6.1):

That section makes no limitation on which of those codes can be
returned by monitors

>
> 0: no error, action succeeded completely
>
> 1: generic or unspecified error (current practice)
> The "monitor" operation shall return this for a
> crashed, hung or otherwise non-functional resource.
>
> 7: program is not running
> Note: This is not the error code to be returned by a
> successful "stop" operation. A successful "stop"
> operation shall return 0. The "monitor" action shall
> return this value only for a _cleanly_ stopped
> resource. If in doubt, it should return 1.
>
> So MONITOR returns:
> 0: resource is running healthy
> 7: resource is stopped
> 1: resource is broken
>
> My confusion lies in what HB does when it receives a
> status=1 after calling a RA MONITOR operation and how
> do I trigger a RECOVER.

You can't.  We don't support it.  At least not yet.
The only way we clean up resources is by stop/starting it.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Re: Re: Heartbeat and RA monitor functions

Reply via email to