[Linux-HA] Re: Heartbeat and RA monitor functions

Joe Bill Wed, 28 May 2008 02:29:52 -0700

>> I also assume that HB performs a 'monitor
>> check-level 0' *after* a successful "start"
>> or "stop".
>> Is this correct ?


> No. If the RA exits with success (0) the action's
> considered to have been successful. (Hmm, how does
> that sound :)

Ugh ! Terrible, really ( ;-) ) !

It suggests that the RA implementation of the 'START'
and 'STOP' operations, should include the code as to
perform, before exiting, ALL the tests that are
carried out at ALL implemented check-levels of
monitoring, as to reliably return a resource's status,
which negates the advantage of having more than 1
monitoring function.

>> Or, I assume that, if HB performs a 'monitor
>> check-level 0', and that operation returns a
>> "ERROR" status, HB automatically performs
>> another 'monitor' of the same resource but with
>> the next check-level known by HB ...

> No.

Ugh!

This means that repairable states of a resource should
be tested for and repaired at *all* check-levels
instead of expecting Heartbeat to gradually shift from
check-level 0 to check-level 20.

This significantly increases the duration of
monitoring at the lower check-levels, which beats the
purpose of check-levels, *to least impact the QOS*, as
described in the OCF spec:

"3.5.3.1. Parameters specific to the 'monitor' action

OCF_CHECK_LEVEL
0 The most lightweight check possible, which should
not
have an impact on the QoS...

10 A medium weight check, expected to be called
multiple times per minute, which should not have a
noticeable impact on the QoS...

20 A heavy weight check, called infrequently, which
may
impact system or service performance..."



      
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Re: Heartbeat and RA monitor functions

Reply via email to