On Fri, Feb 29, 2008 at 8:50 AM, Hildebrand, Nils, 232 <[EMAIL PROTECTED]> wrote: > Hi, > > > [...] > > > > But back to the original question: > > > > > >>>>> Is there a way to tell Linux-HA to retry a failed > > resource after a > > > > > >>>>> certain amount of time again? [...] > > > > > > The mentioned cluster had also a feature called "auto-clear" which > > > would clear the faulted-state after some time. > > > I personally dislike this idea - while I think the idea of a > > > confidence-interval, which clears the fail-count if a > > resource has not > > > faulted and is online again is a good one. > > > > is it not essentially the same thing but with a more > > complicated formula? > > No - it's a different pair of shoes. > > "auto-clear" comes into action only after a resource has failed - which > in my point of view should only be the case when there is something > completely wrong and can't be fixed automatically - but in that case a > human intervention is needed any way - or there is something wrong with > monitoring-methods or the cluster-setup. > > Clearing the fail-count is a more common situation: Timeouts of > monitoring procedures, a monitoring that monitors too early, or a > temporary resource failure which should not cause a failover are here > the main sources - in these cases a warning would be ok - but human > intervention is not normally needed.
the only problem is that to the cluster, all these are indistinguishable from "after a resource has failed" in the first paragraph my expectation would be that clearing the fail-count would rarely be required once failures can be expired in some way. _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
