Peter Kruse wrote:
> Hello,
> 
> thanks for this discussion.
> 
> Andrew Beekhof wrote:
>> On 4/19/07, Peter Kruse <[EMAIL PROTECTED]> wrote:
>>
>> the PE makes zero distinction between them and since it's the one
>> doing the asking i believe that it is its "meaning" that counts.
> 
> yes, think so, too.
> 
>>
>>> > both ask the same question: "Is this resource running?"
>>>
>>> yes, but the consequence to this question is different,
>>> depending on the answer:
>>
>> sometimes, but thats not the RA's business
> 
> i don't agree and right now we're talking about
> logging which is not the only reason why the RAs
> should know more about what's going on.
> 
>>
>>>
>>> on the one hand, "running" can mean heartbeat has to stop it,
>>> because it shouldn't be running
>>> on the other hand, "running" has no consequence because
>>> that's how it's supposed to be.
>>
>> you're making an artificial distinction
> 
> it's not artificial but a distinction that's important in practice.
> 
>> we can also probe in situations where we expect the resource to be
>> running
>>
>> repeat after me
>> - I can not and should not infer anything from the fact that an
>> operation is a "probe"
> 
> no, i won't repeat it :-P
> 
>>> That's exactly the problem, the RA should find out
>>> when to log the correct message.
>>
>> no, it shouldn't.
>>
>> you're putting cluster policy into the RA and it doesn't belong there.
> 
> i don't agree.
> 
>>
>> only the top-level cluster-aware pieces of the cluster know enough to
>> know if the result of an action was good or bad.
>>
>> this is precisely why the crmd process doesn't log ERRORs just because
>> they didn't exit with OCF_SUCCESS (and instead relies on the TE and PE
>> to detect the error conditions)
>>
>>> which is wrong and confusing, because it's not
>>> an error at this stage.
>>
>> my point exactly - the RA does not (and can not) ever have enough
>> information to log those ALERT!! ALERT!! messages accurately for
>> monitor actions.
> 
> maybe in the context but they know nothing (and should not know)
> about the resources, only the RAs know.  They know the reason
> why a reason is failing, and there can be several reasons for
> one resource not running.  for example, apache can be failing
> because the config file is incorrect, the ip address is not up,
> too many requests, oh! whatever, it's the RAs job to find it
> out.
> 
>>
>> let the PE and TE do that for you - they see everything in context.
> 
> you're not saying that PE or TE should do the logging?  or
> that the RAs should not log messages themselves?  Only the
> RAs can log the correct message, PE and TE only know
> "success" or "failure" that's simply not enough.


There are two separate things here:

        What is the state of the world?

        Is the state of the world a problem?

It is the RAs job to figure out the former and non-judgmentally deliver
up the state of the world.

It is _our_ job to figure out if it's an error.

        If a resource is stopped, it might be an error or it might not.

        If a resource is running, it might be an error, or it might not.

The RA does NOT know, and doesn't need to know.  It just needs to tell
someone what the state of the world is.

Only the the Heartbeat infrastructure (CRM, LRM, etc) knows whether the
result is a problem or not.

To be honest, we ought to do a MUCH clearer job of highlighting whether
there was an error, and exactly what that error was.

I've actually thought about suppressing the result of a monitor
operation (not logging it) unless the exit code wasn't what was
expected.  This would have certain advantages.



-- 
    Alan Robertson <[EMAIL PROTECTED]>

"Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to