Re: [Linux-HA] Re: Re: Heartbeat and RA monitor functions

Dejan Muhamedagic Wed, 28 May 2008 08:25:38 -0700

Hi,

On Wed, May 28, 2008 at 07:51:53AM -0700, Joe Bill wrote:
> >> It suggests that the RA implementation of the
> 'START'
> >> and 'STOP' operations, should include the code as
> to
> >> perform, before exiting, ALL the tests that are
> >> carried out at ALL implemented check-levels of
> >> monitoring, as to reliably return a resource's
> >> status,
> 
> > It just relies on the RA properly starting or
> > stoping the resource. It's up to the RA to the
> > its job right. 
> 
> Sure, but in order to determine the status of a
> resource, and declare that resource as
> "running-healthy", the expected outcome of a START
> operation, it requires the RA to perform *as part of*
> the START operation, the best status verification the
> RA can do, that is, what it does exactly under the
> most detailed monitoring it can perform, namely a
> "MONITOR check-level 20" ?


Note that monitor operations are optional. If they're not
defined, then they're not run. Though they still have to be
supported by an RA for the probes.

> And, if the best status verification (MONITOR) has to
> *always* be performed, what is then the point of
> having  *lesser* performing status verification, all
> regardless of the time it takes for these status
> verifications to complete ?

This is also up to the user, to define whichever monitor
operation he wants to run. Just an RA supporting them doesn't
mean that they are going to be used. There's too many different
clusters out there.

IIRC, in most RAs I implemented, the monitor operation is
actually run from within the start to verify if the resource's
really running. If I'm reading your comments correctly, you'd
like to move that step to the resource manager. Anyway, one of
the premises is that an RA knows best what's good for the
resources it's supposed to manage. As I said before, you may file
an enhancement bugzilla and see how it goes.

> >> This means that repairable states of a resource
> >> should be tested for and repaired at *all*
> >> check-levels instead of expecting Heartbeat to
> >> gradually shift from check-level 0 to
> >> check-level 20.
> 
> > Er, what's a "repairable state"?
> > And who's to repair a resource?
> 
> A state where "it is advantageous to use" a "recover"
> function, provided and advertised by the RA, when
> compared to a stop/start operation.
> 
> OCF RA API, section 3.4.4 "recover"

I think that that one's not implemented yet in CRM. And so far I
didn't see an RA which supports that. It is commonly assumed that
a resource recovery needs help from humans. That may not always
be true, of course.

Thanks,

Dejan
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Re: Re: Heartbeat and RA monitor functions

Reply via email to