On 2008-02-04T12:58:47, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote:
> > This is not quite true. The cluster cannot get direct confirmation from
> > the device which pulled the plug, but we're talking probabilities here.
> stonith is an all-or-nothing proposition.
Nothing ever is ;-) We're just willing to assume that the probability of
this going wrong is so small that we pretend it is all-or-nothing.
> > So, as I've explained elsewhere, the suicide plugin could be made so
> > robust that indeed it can be trusted - my preferred option would be for
> > it to send a coded, non-replayable UDP packet just 1s before committing
> > suicide (in the most reliable method available - local hardware
> > watchdog and/or directly invoking the kernel), and if the node then
> > stops pinging within 3s (or whatever, as long as it is as low-level as
> > possible), to indeed report success to the other cluster nodes.
> Given that the node can communicate with others and that is not
> always the case.
It is the case for one-node-clusters and for stop failures. If the node
is unreachable, the other STONITH mechanisms will kick in, no harm done.
> Furthermore, it would be rather hard to implement this. stonith never
> tries to talk to the node which is to be reset.
Not that difficult. The node would simply reply to the "who can do it"
query itself, and the requesting node would take the "last gasp" as a
confirmation of the operation.
Regards,
Lars
--
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems