Re: [Linux-HA] Distinguish probe and monitor

Peter Kruse Thu, 19 Apr 2007 12:52:39 -0700

Hello,

thanks for this discussion.


Andrew Beekhof wrote:

On 4/19/07, Peter Kruse <[EMAIL PROTECTED]> wrote:

the PE makes zero distinction between them and since it's the one
doing the asking i believe that it is its "meaning" that counts.


yes, think so, too.

> both ask the same question: "Is this resource running?"

yes, but the consequence to this question is different,
depending on the answer:


sometimes, but thats not the RA's business


i don't agree and right now we're talking about
logging which is not the only reason why the RAs
should know more about what's going on.


on the one hand, "running" can mean heartbeat has to stop it,
because it shouldn't be running
on the other hand, "running" has no consequence because
that's how it's supposed to be.


you're making an artificial distinction


it's not artificial but a distinction that's important in practice.

we can also probe in situations where we expect the resource to be running

repeat after me
- I can not and should not infer anything from the fact that an
operation is a "probe"


no, i won't repeat it :-P

That's exactly the problem, the RA should find out
when to log the correct message.


no, it shouldn't.

you're putting cluster policy into the RA and it doesn't belong there.


i don't agree.


only the top-level cluster-aware pieces of the cluster know enough to
know if the result of an action was good or bad.

this is precisely why the crmd process doesn't log ERRORs just because
they didn't exit with OCF_SUCCESS (and instead relies on the TE and PE
to detect the error conditions)

which is wrong and confusing, because it's not
an error at this stage.


my point exactly - the RA does not (and can not) ever have enough
information to log those ALERT!! ALERT!! messages accurately for
monitor actions.


maybe in the context but they know nothing (and should not know)
about the resources, only the RAs know.  They know the reason
why a reason is failing, and there can be several reasons for
one resource not running.  for example, apache can be failing
because the config file is incorrect, the ip address is not up,
too many requests, oh! whatever, it's the RAs job to find it
out.


let the PE and TE do that for you - they see everything in context.


you're not saying that PE or TE should do the logging?  or
that the RAs should not log messages themselves?  Only the
RAs can log the correct message, PE and TE only know
"success" or "failure" that's simply not enough.

>> Convinced?
>
> nope :-)

still not?


afraid not


still trying... ;)
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Distinguish probe and monitor

Reply via email to