On Fri, Feb 25, 2011 at 12:51 PM, Stallmann, Andreas <[email protected]> wrote: > Hi! > > I conentrate both your answers into one mail, I hope that's allright for you. > >> >For now, I need an interim solution, which is, as of now, stonith via >> >suicide. >> Doesn't work as suicide is not considered reliable - by definition the >> remaining nodes have no way to verify that the fencing operation was >> successful. >> Suspect it will still fail though, suicide isnt a supported fencing option - >> since obviously the other nodes can't confirm it happened. > > Ok then, I know I'm a little bit provocative right now: > > If "suicide" is no supported fencing option, why is it still included with > stonith?
Left over from heartbeat v1 days I guess. Could also be a testing-only device like ssh. > It's badly documented, and I didn't find a single (official) document on > howto implement a (stable!) suicide-stonith, Because you can't. Suicide is not, will not, can not be reliable. The whole point of stonith is to create a known node state (off) in situations where you cannot be sure if your peer is alive, dead or some state in-between. Suicide does not achieve this in any way, shape or form. It requires a "sick" node to suddenly start functioning correctly - so attempting to self-terminate makes some sense, relying on it to succeed does not seem prudent. > but it's there, and thus it should be usable. If it isn't, the maintainer > should please (please!) remove it or supply something that's working. I do > know, that's quite demanding, because the maintainer will probably do the > development in his (or her) free time. Still... > > I do as well agree, that "suicide" is a very special way of keeping a cluster > consistent, very different from the other stonith methods. I wouldn't expect > it under stonith, I'd rather think... > >> Yes no-quorum-policy=suicide means that all nodes in the partition will end >> up being shot, but you still require a real stonith device so that >> _someone_else_ can perform it. > ...that if you set "no-quorum-policy=suicide", the suicide script is executed > by the node itself. It should be an *extra* feature *besides* stonith. The > procedure should be something like: > > 1) node1: Allright, I have no quorum anymore. Let's wait for a while... > 2)... a while passes > 3) node1: OK, I'm still without quorum, no contact to my peers, whatsoever. > I'd rather shut myself down, before I cause a mess. > > If, during (2), the other nodes find a way to shut down the node externaly > (if through ssh, a power switch, a virtualisation host...), that's even > better, because then the cluster "knows", that it's still consistent. I'm > with you, here. > > If a split brain happens in a split site scenario, a "suicide" might be the > only way to keep up consistency, because no one will be able to reach any > device on the other site... Please correct me if I'm wrong. What do you do in > such a case? What's your exemplary implementation of Linux-HA then? > > On the other hand, it doen't make any other sense to name a > "no-quorum-policy" "suicide", if it's anything, but a suicide (if, at all, > one could name it "assisted suicide"). > > Please correct me: Do I have a utterly wrong understanding of the whole > process (that could be very well the case), is the implementation not > entirely thought through, or is the naming of certain components not as good > as it could be? > > I might point you to > http://osdir.com/ml/linux.highavailability.devel/2007-11/msg00026.html, > because the same thing has been discussed then, and I very much do think, > that Lars was right with what he wrote. Has anything changed in the concept > of suicide/quorum-loss/stonith since then? That's not a provocative question, > well, maybe it is, but it's not meant to be. > > In addition: Something that's missing from the manuals is a "case study" (or > something the like) on how to implement a split side scenario. How should the > cluster be build then? If you have to sides? If you have one? How should the > storage-replication be set up? Is synchronous replication like in drbd really > a good idea then, performance wise? I think I'll finally have to buy a book. > :-) Any recommendations (either english or german prefered). > > Well, thank's a lot again, my brain didn't explode (that's something good, I > feel), but I'm not entirely happy, though. > > Cheers and have a nice weekend, > > Andreas > > > ------------------------ > CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef. > Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136) > Geschäftsführer/Managing Directors: Jürgen Zender (Sprecher/Chairman), Anke > Höfer > Vorsitzender des Aufsichtsrates/Chairman of the Supervisory Board: Hans > Jürgen Niemeier > > CONET Technologies AG, Theodor-Heuss-Allee 19, 53773 Hennef. > Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 10328 ) > Vorstand/Member of the Managementboard: Rüdiger Zeyen (Sprecher/Chairman), > Wilfried Pütz > Vorsitzender des Aufsichtsrates/Chairman of the Supervisory Board: Dr. Gerd > Jakob > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
