Re: [Linux-HA] Distinguish probe and monitor

Andrew Beekhof Thu, 19 Apr 2007 09:13:16 -0700

On 4/19/07, Peter Kruse <[EMAIL PROTECTED]> wrote:

Hi,


Andrew Beekhof wrote:
> On 4/19/07, Peter Kruse <[EMAIL PROTECTED]> wrote:
>> The point is:  the _meaning_ of probe is different to monitor:
>
> no, its not.  trust me :-)

what do you mean?  of course it's different!


i mean, that there is no difference what-so-ever between a probe and
any other monitoring action (except for the fact that one is
recurring).

the PE makes zero distinction between them and since it's the one
doing the asking i believe that it is its "meaning" that counts.

> both ask the same question: "Is this resource running?"

yes, but the consequence to this question is different,
depending on the answer:


sometimes, but thats not the RA's business


on the one hand, "running" can mean heartbeat has to stop it,
because it shouldn't be running
on the other hand, "running" has no consequence because
that's how it's supposed to be.


you're making an artificial distinction
we can also probe in situations where we expect the resource to be running

repeat after me
- I can not and should not infer anything from the fact that an
operation is a "probe"


>>
>> on monitor, if the resource is running then it's ok
>> on probe, it is generally not ok in the sense that it's
>> generally not what you want.
>> That means, maybe you want your RA to log a message
>> when called as "probe" to write something like:
>>
>> "Resource is running although it shouldn't"
>
> probes don't only happen at startup so this assumption does not hold

That's exactly the problem, the RA should find out
when to log the correct message.


no, it shouldn't.

you're putting cluster policy into the RA and it doesn't belong there.

only the top-level cluster-aware pieces of the cluster know enough to
know if the result of an action was good or bad.

this is precisely why the crmd process doesn't log ERRORs just because
they didn't exit with OCF_SUCCESS (and instead relies on the TE and PE
to detect the error conditions)

Example:
On startup, when the probe is called, all (our) RAs
log error messages like

"ERROR! ERROR! Apache is not running!"

which is wrong and confusing, because it's not
an error at this stage.


my point exactly - the RA does not (and can not) ever have enough
information to log those ALERT!! ALERT!! messages accurately for
monitor actions.

let the PE and TE do that for you - they see everything in context.

 It should log a different
message that don't alert an admin.
I hope you can agree to that.
>
>> but when called as "monitor" you would probably
>> log nothing on success.
>>
>> Convinced?
>
> nope :-)

still not?


afraid not
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Distinguish probe and monitor

Reply via email to