Hi! I conentrate both your answers into one mail, I hope that's allright for you.
> >For now, I need an interim solution, which is, as of now, stonith via > >suicide. > Doesn't work as suicide is not considered reliable - by definition the > remaining nodes have no way to verify that the fencing operation was > successful. > Suspect it will still fail though, suicide isnt a supported fencing option - > since obviously the other nodes can't confirm it happened. Ok then, I know I'm a little bit provocative right now: If "suicide" is no supported fencing option, why is it still included with stonith? It's badly documented, and I didn't find a single (official) document on howto implement a (stable!) suicide-stonith, but it's there, and thus it should be usable. If it isn't, the maintainer should please (please!) remove it or supply something that's working. I do know, that's quite demanding, because the maintainer will probably do the development in his (or her) free time. Still... I do as well agree, that "suicide" is a very special way of keeping a cluster consistent, very different from the other stonith methods. I wouldn't expect it under stonith, I'd rather think... > Yes no-quorum-policy=suicide means that all nodes in the partition will end > up being shot, but you still require a real stonith device so that > _someone_else_ can perform it. ...that if you set "no-quorum-policy=suicide", the suicide script is executed by the node itself. It should be an *extra* feature *besides* stonith. The procedure should be something like: 1) node1: Allright, I have no quorum anymore. Let's wait for a while... 2)... a while passes 3) node1: OK, I'm still without quorum, no contact to my peers, whatsoever. I'd rather shut myself down, before I cause a mess. If, during (2), the other nodes find a way to shut down the node externaly (if through ssh, a power switch, a virtualisation host...), that's even better, because then the cluster "knows", that it's still consistent. I'm with you, here. If a split brain happens in a split site scenario, a "suicide" might be the only way to keep up consistency, because no one will be able to reach any device on the other site... Please correct me if I'm wrong. What do you do in such a case? What's your exemplary implementation of Linux-HA then? On the other hand, it doen't make any other sense to name a "no-quorum-policy" "suicide", if it's anything, but a suicide (if, at all, one could name it "assisted suicide"). Please correct me: Do I have a utterly wrong understanding of the whole process (that could be very well the case), is the implementation not entirely thought through, or is the naming of certain components not as good as it could be? I might point you to http://osdir.com/ml/linux.highavailability.devel/2007-11/msg00026.html, because the same thing has been discussed then, and I very much do think, that Lars was right with what he wrote. Has anything changed in the concept of suicide/quorum-loss/stonith since then? That's not a provocative question, well, maybe it is, but it's not meant to be. In addition: Something that's missing from the manuals is a "case study" (or something the like) on how to implement a split side scenario. How should the cluster be build then? If you have to sides? If you have one? How should the storage-replication be set up? Is synchronous replication like in drbd really a good idea then, performance wise? I think I'll finally have to buy a book. :-) Any recommendations (either english or german prefered). Well, thank's a lot again, my brain didn't explode (that's something good, I feel), but I'm not entirely happy, though. Cheers and have a nice weekend, Andreas ------------------------ CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef. Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136) Geschäftsführer/Managing Directors: Jürgen Zender (Sprecher/Chairman), Anke Höfer Vorsitzender des Aufsichtsrates/Chairman of the Supervisory Board: Hans Jürgen Niemeier CONET Technologies AG, Theodor-Heuss-Allee 19, 53773 Hennef. Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 10328 ) Vorstand/Member of the Managementboard: Rüdiger Zeyen (Sprecher/Chairman), Wilfried Pütz Vorsitzender des Aufsichtsrates/Chairman of the Supervisory Board: Dr. Gerd Jakob _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems