On Mon, Aug 11, 2008 at 10:36:29PM +0000, Todd, Conor wrote:
> > > I've set up a five-node HA cluster running a bunch of services, and
> > > all seems to be going well, although one node insists on telling
> > > everyone else that its running any new resource which is set-up but
> > > not activated.  I use crm_resource -C to deal with this
> > situation, so
> > > it's not so bad.
> > >
> > > Anyway, these nodes are HP DL380-G5s, and so I'm using the riloe
> > > STONITH script.  I tested it by hand, and it works like a charm.
> >
> > How did you test it?
> 
> stonith -t external/riloe -p <param list> -T reset
> 
> It worked, so I configured the stonith resource for each
> machine by specifying all of the parameters in the same way as
> I did here.

That's fine then.

> > > I've set up the STONITH resources so that they never run on
> > the same
> > > machine as the one they control.  The other day, I
> > artificially caused
> > > a situation in which one of the nodes should have been fenced.  The
> > > cluster realized this and "scheduled" it for fencing, but the fence
> > > never happened.  I'm wondering what this "scheduling" is, and what
> > > parameters are available to control it?
> >
> > I suppose that when you say "scheduled" you're referring to a
> > log message. That means that the cluster (CRM) decided that a
> > node should be fenced. If that didn't happen then your
> > stonith module doesn't work. There should've been an error
> > message in the logs.
> > You can test your setup using the stonith program (see the
> > stonith(8) man page for details). If it doesn't work as you
> > expect, turn debugging on with the -d option.
> 
> stonith on the command-line did work, and I configured the
> stonith resources in the same way.
> 
> CRM never got around to actually doing the fencing, and so the
> logs never said anything more than "node x scheduled for
> fencing".  It never even tried to fence.

Hmm. AFAIK, if the crm says that then that means that it is going
to do that. Afterwards, you should see sth like:

Jul 20 19:33:32 xen-c stonithd: [14161]: info: client tengine [pid: 15275] want 
a STONITH operation RESET to node xen-d. 

If you don't see this one, then something's very bad.

And, if reset succeeded:

Jul 20 19:33:32 xen-c stonithd: [14161]: info: Succeeded to STONITH the node 
xen-d: optype=RESET. whodoit: xen-c 

Which release do you run?

Thanks,

Dejan


> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to