Hi Lars, Am Mittwoch, 30. November 2005 10:33 schrieb Lars Marowsky-Bree: > On 2005-11-22T13:29:06, Andrew Beekhof <[EMAIL PROTECTED]> wrote: > > > How do I get rid of that restriction? > > > > once you have fixed whatever the problem was on node 3, you can use: > > crm_resource -C -H node3 -r name_of_the_resource_that_failed_there > > BTW, in a long running cluster, even with sporadic failures, this can > accumulate over time. Say that due to whatever reason we have a start > failure (or, in the future, a monitor failure causing a failover) once > every two months or so. Still, after a year, suddenly all nodes will > have been "exhausted". > > What I'm trying to say is that this "resource foo failed somewhere" > probably should have a timeout - say, a day or something by default. > > Is that a sane suggestion? If so, I'll go file a bugzilla. > > > Sincerely, > Lars Marowsky-Brée <[EMAIL PROTECTED]>
How about adding some score to a resource location constraint on each failure? Then, over time, the most robust nodes are the most likely to get the resources. But as a first step, I'd just use a script that lists me nodes with failure constraints. Then I'd remove them with Andrews command as soon as the cause of the failure is gone. As an admin I have to look at those events anyways. Regards, Joachim Banzhaf _______________________________________________________ Linux-HA-Dev: [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
