Hi Lars,

Am Mittwoch, 30. November 2005 10:33 schrieb Lars Marowsky-Bree:
> On 2005-11-22T13:29:06, Andrew Beekhof <[EMAIL PROTECTED]> wrote:
> > > How do I get rid of that restriction?
> >
> > once you have fixed whatever the problem was on node 3, you can use:
> >    crm_resource -C -H node3 -r name_of_the_resource_that_failed_there
>
> BTW, in a long running cluster, even with sporadic failures, this can
> accumulate over time. Say that due to whatever reason we have a start
> failure (or, in the future, a monitor failure causing a failover) once
> every two months or so. Still, after a year, suddenly all nodes will
> have been "exhausted".
>
> What I'm trying to say is that this "resource foo failed somewhere"
> probably should have a timeout - say, a day or something by default.
>
> Is that a sane suggestion? If so, I'll go file a bugzilla.
>
>
> Sincerely,
>     Lars Marowsky-Brée <[EMAIL PROTECTED]>

How about adding some score to a resource location constraint on each failure?
Then, over time, the most robust nodes are the most likely to get the 
resources.
But as a first step, I'd just use a script that lists me nodes with failure 
constraints. Then I'd remove them with Andrews command as soon as the cause 
of the failure is gone. As an admin I have to look at those events anyways.

Regards,

Joachim Banzhaf
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to