>>> Lars Marowsky-Bree <[email protected]> schrieb am 27.10.2011 um 17:27 in 
>>> Nachricht
<[email protected]>:
> On 2011-10-27T13:39:10, Ulrich Windl <[email protected]> 
> wrote:
> 
> > According to the docs, the status/monitor operation may not return 
> OCF_ERR_INSTALLED, but only OCF-ERR_GENERIC if something is wrong (eg. 
> required software not installed).
> 
> That is not correct, where did you read that?

Hi!

as there is no real specification, I'm referring to section 5.3 (monitor 
action) of the dev-guide on linux-ha.org.

> 
> Basically, the logic is a bit different - OCF_NOT_RUNNING should be
> returned if the service is down cleanly, regardless of whether binaries
> are present or not.

Yes, but you need to indicate that a start would not make much sense, also. I 
agree, this is a special case.


> 
> OCF_ERR_GENERIC should only be returned if it is up in some form and not
> cleanly stopped nor running.

Exactly: There are some "resources" that consist of several services 
(processes) where it's possible that some services are up, and some services 
are down, but not all of them. I had exactly that case.

The current monitor can be interpreted this way:
rc=0: "everything is up" or "not everything is down"
rc!=0: "everything is down" or "not everything is up"

So there is no rc for "something is up, something is down" (indetermined state).
That could mean that the resource is in a bad state (should it be re-started, 
or should it be stopped?), or the resource could be in a state transition (from 
up to down or the other way). Basically I think there should be more states 
like "stopped", "starting", "started", "stopping".

And the monitor should have an additional "undetermined" state that is 
different from the "inability to monitor". So a change from "started" to 
"undetermined" is most likely a "stopping" state, while a change from "stopped" 
to "undetermined" most likely means "starting".

> 
> > Only start and stop methods are allowed to return OCF_ERR_INSTALLED,
> > possibly preventing a resource start if something is not
> > configured/installed.
> 
> Yes, they can also return this.
> 
> > Now I'm afraid if the status/monitor returns OCF_ERR_GENERIC on a probe the 
> node is fenced: LRM will try to stop that resource, but the stop will return 
> OCF_ERR_INSTALLED, causing a fence. Right?
> 
> Depends. "stop" should return success if the service is cleanly stopped,
> regardless of whether binaries etc are present or not.

OK, but if you need the binary to determine the state, you are having a problem 
(which is the case with some commercial software). You might argue that without 
the binary the software isn't installed, and thus cannot be running. But the 
you just pushed the problem to the "start" method (which would fail to start 
the "not running" resource).

> 
> Returning OCF_ERR_GENERIC for the startup probe is a bad idea, because
> it'll trigger the multi-node recovery logic. (Unless, of course, it is
> indeed up.)

Yes, I'm retuning "not running" when unable to determine the state, but that's 
not 100% clean.

> 
> > I think the status/monitor should be allowed to return OCF_ERR_INSTALLED.
> 
> They can, it'll just cause the service to be started on that node. If
> the installation error is unlikely to be remedied by further
> dependencies having started, that may be just the right answer.

I know, the monitor can return anything, but the question is who will handle 
the return code. I don't know. Maybe a table of methods and allowed return 
codes would be helpful.

Regards,
Ulrich

> 
> 
> Regards,
>     Lars



 

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to