>>> Lars Marowsky-Bree <[email protected]> schrieb am 27.10.2011 um 17:27 in >>> Nachricht <[email protected]>: > On 2011-10-27T13:39:10, Ulrich Windl <[email protected]> > wrote: > > > According to the docs, the status/monitor operation may not return > OCF_ERR_INSTALLED, but only OCF-ERR_GENERIC if something is wrong (eg. > required software not installed). > > That is not correct, where did you read that?
Hi! as there is no real specification, I'm referring to section 5.3 (monitor action) of the dev-guide on linux-ha.org. > > Basically, the logic is a bit different - OCF_NOT_RUNNING should be > returned if the service is down cleanly, regardless of whether binaries > are present or not. Yes, but you need to indicate that a start would not make much sense, also. I agree, this is a special case. > > OCF_ERR_GENERIC should only be returned if it is up in some form and not > cleanly stopped nor running. Exactly: There are some "resources" that consist of several services (processes) where it's possible that some services are up, and some services are down, but not all of them. I had exactly that case. The current monitor can be interpreted this way: rc=0: "everything is up" or "not everything is down" rc!=0: "everything is down" or "not everything is up" So there is no rc for "something is up, something is down" (indetermined state). That could mean that the resource is in a bad state (should it be re-started, or should it be stopped?), or the resource could be in a state transition (from up to down or the other way). Basically I think there should be more states like "stopped", "starting", "started", "stopping". And the monitor should have an additional "undetermined" state that is different from the "inability to monitor". So a change from "started" to "undetermined" is most likely a "stopping" state, while a change from "stopped" to "undetermined" most likely means "starting". > > > Only start and stop methods are allowed to return OCF_ERR_INSTALLED, > > possibly preventing a resource start if something is not > > configured/installed. > > Yes, they can also return this. > > > Now I'm afraid if the status/monitor returns OCF_ERR_GENERIC on a probe the > node is fenced: LRM will try to stop that resource, but the stop will return > OCF_ERR_INSTALLED, causing a fence. Right? > > Depends. "stop" should return success if the service is cleanly stopped, > regardless of whether binaries etc are present or not. OK, but if you need the binary to determine the state, you are having a problem (which is the case with some commercial software). You might argue that without the binary the software isn't installed, and thus cannot be running. But the you just pushed the problem to the "start" method (which would fail to start the "not running" resource). > > Returning OCF_ERR_GENERIC for the startup probe is a bad idea, because > it'll trigger the multi-node recovery logic. (Unless, of course, it is > indeed up.) Yes, I'm retuning "not running" when unable to determine the state, but that's not 100% clean. > > > I think the status/monitor should be allowed to return OCF_ERR_INSTALLED. > > They can, it'll just cause the service to be started on that node. If > the installation error is unlikely to be remedied by further > dependencies having started, that may be just the right answer. I know, the monitor can return anything, but the question is who will handle the return code. I don't know. Maybe a table of methods and allowed return codes would be helpful. Regards, Ulrich > > > Regards, > Lars _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
