Re: [Linux-HA] Re: Heartbeat and RA monitor functions

Dejan Muhamedagic Wed, 28 May 2008 03:24:44 -0700

Hi,

On Wed, May 28, 2008 at 02:02:44AM -0700, Joe Bill wrote:
> >> I also assume that HB performs a 'monitor
> >> check-level 0' *after* a successful "start"
> >> or "stop".
> >> Is this correct ?
> 
> > No. If the RA exits with success (0) the action's
> > considered to have been successful. (Hmm, how does
> > that sound :)
> 
> Ugh ! Terrible, really ( ;-) ) !
> 
> It suggests that the RA implementation of the 'START'
> and 'STOP' operations, should include the code as to
> perform, before exiting, ALL the tests that are
> carried out at ALL implemented check-levels of
> monitoring, as to reliably return a resource's status,


It just relies on the RA properly starting or stoping the
resource. It's up to the RA to the its job right.

> which negates the advantage of having more than 1
> monitoring function.

All defined (and enabled) monitoring actions are scheduled
immediately after the start action.

> >> Or, I assume that, if HB performs a 'monitor
> >> check-level 0', and that operation returns a
> >> "ERROR" status, HB automatically performs
> >> another 'monitor' of the same resource but with
> >> the next check-level known by HB ...
> 
> > No.
> 
> Ugh!
> 
> This means that repairable states of a resource should
> be tested for and repaired at *all* check-levels
> instead of expecting Heartbeat to gradually shift from
> check-level 0 to check-level 20.

Er, what's a "repairable state"? And who's to repair a resource?

> This significantly increases the duration of
> monitoring at the lower check-levels, which beats the
> purpose of check-levels, *to least impact the QOS*, as
> described in the OCF spec:
> 
> "3.5.3.1. Parameters specific to the 'monitor' action
> 
> OCF_CHECK_LEVEL
> 0 The most lightweight check possible, which should
> not
> have an impact on the QoS...
> 
> 10 A medium weight check, expected to be called
> multiple times per minute, which should not have a
> noticeable impact on the QoS...
> 
> 20 A heavy weight check, called infrequently, which
> may
> impact system or service performance..."

Sorry, I must be missing something here. Various check levels are
carried out only if they were configured in the CIB and with
intervals as configured in the CIB. Check levels don't depend
on each other in any way. If any of them reports a problem, it is
trusted and no further checks/monitors are done. I don't see
anything wrong with that. Nor do I see any connection to the
quoted OCF specification text.

Of course, if you feel that there's a problem or that heartbeat
could be enhanced please file a bugzilla.

Thanks,

Dejan

> 
> 
>       
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Re: Heartbeat and RA monitor functions

Reply via email to