On Fri, Feb 29, 2008 at 8:50 AM, Hildebrand, Nils, 232
<[EMAIL PROTECTED]> wrote:
> Hi,
>
>  > [...]
>
> > > But back to the original question:
>  > >
>  > >>>>> Is there a way to tell Linux-HA to retry a failed
>  > resource after a
>  > >
>  > >>>>> certain amount of time again? [...]
>  > >
>  > > The mentioned cluster had also a feature called "auto-clear" which
>  > > would clear the faulted-state after some time.
>  > > I personally dislike this idea - while I think the idea of a
>  > > confidence-interval, which clears the fail-count if a
>  > resource has not
>  > > faulted and is online again is a good one.
>  >
>  > is it not essentially the same thing but with a more
>  > complicated formula?
>
>  No - it's a different pair of shoes.
>
>  "auto-clear" comes into action only after a resource has failed - which
>  in my point of view should only be the case when there is something
>  completely wrong and can't be fixed automatically - but in that case a
>  human intervention is needed any way - or there is something wrong with
>  monitoring-methods or the cluster-setup.
>
>  Clearing the fail-count is a more common situation: Timeouts of
>  monitoring procedures, a monitoring that monitors too early, or a
>  temporary resource failure which should not cause a failover are here
>  the main sources - in these cases a warning would be ok - but human
>  intervention is not normally needed.

the only problem is that to the cluster, all these are
indistinguishable from "after a resource has failed" in the first
paragraph

my expectation would be that clearing the fail-count would rarely be
required once failures can be expired in some way.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to