Re: [Linux-HA] The suicide stonith plugin doesn't work with 2.1.3 (?)

Dejan Muhamedagic Mon, 04 Feb 2008 03:59:23 -0800

Hi,

On Sun, Feb 03, 2008 at 09:09:36PM +0100, Lars Marowsky-Bree wrote:
> On 2008-02-01T15:29:55, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote:
> 
> > >   It turns out that the suicide stonith plugin doesn't work with crm in 
> > > v2.1.3.
> > > 
> > >   The reason is crm stopping all managed resources on the node before
> > > it is fenced. However, when the suicide stonith resource is moved
> > > away, it can't suicide the node anymore.
> > 
> > :)
> 
> I think the "best" answer here is that "suicide" should be a basic
> capability of a cluster member - and running w/o any plugin loaded
> explicitly at all, possibly autoloaded? This would work around this
> bit.


Agreed.

> > >   I saw some discussion about the suicide plugin on this list before,
> > > but seems nobody actually used it.
> > > 
> > >   Can we have workaround on this ?
> > The stonith daemon prevents node from shooting itself. I wonder
> > if it ever worked or, if it did, it must have been with some
> > earlier version of stonithd. One good reason for such a behaviour
> > is that, obviously, the cluster can't get confirmation of such a
> > stonith operation.
> 
> This is not quite true. The cluster cannot get direct confirmation from
> the device which pulled the plug, but we're talking probabilities here.

stonith is an all-or-nothing proposition.

> (There always is a certain probability that even such devices go wrong,
> and report success after either not fencing or fencing the wrong node.)

True, though I don't see what heartbeat can do about it.

> So, as I've explained elsewhere, the suicide plugin could be made so
> robust that indeed it can be trusted - my preferred option would be for
> it to send a coded, non-replayable UDP packet just 1s before committing
> suicide (in the most reliable method available - local hardware
> watchdog and/or directly invoking the kernel), and if the node then
> stops pinging within 3s (or whatever, as long as it is as low-level as
> possible), to indeed report success to the other cluster nodes.

Given that the node can communicate with others and that is not
always the case. Furthermore, it would be rather hard to
implement this. stonith never tries to talk to the node which is
to be reset.

Thanks,

Dejan


> Regards,
>     Lars
> 
> -- 
> Teamlead Kernel, SuSE Labs, Research and Development
> SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG N?rnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
> 
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] The suicide stonith plugin doesn't work with 2.1.3 (?)

Reply via email to