Hi, On Wed, May 28, 2008 at 02:02:44AM -0700, Joe Bill wrote: > >> I also assume that HB performs a 'monitor > >> check-level 0' *after* a successful "start" > >> or "stop". > >> Is this correct ? > > > No. If the RA exits with success (0) the action's > > considered to have been successful. (Hmm, how does > > that sound :) > > Ugh ! Terrible, really ( ;-) ) ! > > It suggests that the RA implementation of the 'START' > and 'STOP' operations, should include the code as to > perform, before exiting, ALL the tests that are > carried out at ALL implemented check-levels of > monitoring, as to reliably return a resource's status,
It just relies on the RA properly starting or stoping the resource. It's up to the RA to the its job right. > which negates the advantage of having more than 1 > monitoring function. All defined (and enabled) monitoring actions are scheduled immediately after the start action. > >> Or, I assume that, if HB performs a 'monitor > >> check-level 0', and that operation returns a > >> "ERROR" status, HB automatically performs > >> another 'monitor' of the same resource but with > >> the next check-level known by HB ... > > > No. > > Ugh! > > This means that repairable states of a resource should > be tested for and repaired at *all* check-levels > instead of expecting Heartbeat to gradually shift from > check-level 0 to check-level 20. Er, what's a "repairable state"? And who's to repair a resource? > This significantly increases the duration of > monitoring at the lower check-levels, which beats the > purpose of check-levels, *to least impact the QOS*, as > described in the OCF spec: > > "3.5.3.1. Parameters specific to the 'monitor' action > > OCF_CHECK_LEVEL > 0 The most lightweight check possible, which should > not > have an impact on the QoS... > > 10 A medium weight check, expected to be called > multiple times per minute, which should not have a > noticeable impact on the QoS... > > 20 A heavy weight check, called infrequently, which > may > impact system or service performance..." Sorry, I must be missing something here. Various check levels are carried out only if they were configured in the CIB and with intervals as configured in the CIB. Check levels don't depend on each other in any way. If any of them reports a problem, it is trusted and no further checks/monitors are done. I don't see anything wrong with that. Nor do I see any connection to the quoted OCF specification text. Of course, if you feel that there's a problem or that heartbeat could be enhanced please file a bugzilla. Thanks, Dejan > > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
