On Wed, May 28, 2008 at 5:26 PM, Joe Bill <[EMAIL PROTECTED]> wrote: > >> It is also advisable that it accurately report >> the service's true state after a start operation >> and mandatory for a stop. >> This is easily done by calling a level 0 check >> in a loop at the end of both functions. > > Sure, but HB's business being calling RAs to perform > all those STARTs and STOPs and MONITORs, *** for the > sake of consistency,*** I would have thought it to be > HB's job to call those level 0 checks after a resource > state changing operation.
Short answer: no Long answer: This isn't about consistency - its about the RA do whatever it needs to in order to return accurate results. It just so happens that the easiest way to achieve that is to call a function it already implements. We also require the RA to wait until the resource has finished stopped/starting - otherwise the cluster has the impossible task of trying to figure out if the resource is broken or just in the middle of changing state. So there are a number of reasons why its a good idea for RAs to do this. > >> Calling more intensive checks is up to the RA >> writer and in the case of stops, depends on the >> chances of the level 0 check being incorrect. > > Precisely my own thought. > > My understanding is that MONITOR may return only the > following status, (described in OCF RA API section > 3.6.1): That section makes no limitation on which of those codes can be returned by monitors > > 0: no error, action succeeded completely > > 1: generic or unspecified error (current practice) > The "monitor" operation shall return this for a > crashed, hung or otherwise non-functional resource. > > 7: program is not running > Note: This is not the error code to be returned by a > successful "stop" operation. A successful "stop" > operation shall return 0. The "monitor" action shall > return this value only for a _cleanly_ stopped > resource. If in doubt, it should return 1. > > So MONITOR returns: > 0: resource is running healthy > 7: resource is stopped > 1: resource is broken > > My confusion lies in what HB does when it receives a > status=1 after calling a RA MONITOR operation and how > do I trigger a RECOVER. You can't. We don't support it. At least not yet. The only way we clean up resources is by stop/starting it. _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
