On Mon, Aug 11, 2008 at 10:36:29PM +0000, Todd, Conor wrote: > > > I've set up a five-node HA cluster running a bunch of services, and > > > all seems to be going well, although one node insists on telling > > > everyone else that its running any new resource which is set-up but > > > not activated. I use crm_resource -C to deal with this > > situation, so > > > it's not so bad. > > > > > > Anyway, these nodes are HP DL380-G5s, and so I'm using the riloe > > > STONITH script. I tested it by hand, and it works like a charm. > > > > How did you test it? > > stonith -t external/riloe -p <param list> -T reset > > It worked, so I configured the stonith resource for each > machine by specifying all of the parameters in the same way as > I did here.
That's fine then. > > > I've set up the STONITH resources so that they never run on > > the same > > > machine as the one they control. The other day, I > > artificially caused > > > a situation in which one of the nodes should have been fenced. The > > > cluster realized this and "scheduled" it for fencing, but the fence > > > never happened. I'm wondering what this "scheduling" is, and what > > > parameters are available to control it? > > > > I suppose that when you say "scheduled" you're referring to a > > log message. That means that the cluster (CRM) decided that a > > node should be fenced. If that didn't happen then your > > stonith module doesn't work. There should've been an error > > message in the logs. > > You can test your setup using the stonith program (see the > > stonith(8) man page for details). If it doesn't work as you > > expect, turn debugging on with the -d option. > > stonith on the command-line did work, and I configured the > stonith resources in the same way. > > CRM never got around to actually doing the fencing, and so the > logs never said anything more than "node x scheduled for > fencing". It never even tried to fence. Hmm. AFAIK, if the crm says that then that means that it is going to do that. Afterwards, you should see sth like: Jul 20 19:33:32 xen-c stonithd: [14161]: info: client tengine [pid: 15275] want a STONITH operation RESET to node xen-d. If you don't see this one, then something's very bad. And, if reset succeeded: Jul 20 19:33:32 xen-c stonithd: [14161]: info: Succeeded to STONITH the node xen-d: optype=RESET. whodoit: xen-c Which release do you run? Thanks, Dejan > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
