On Tue, Aug 21, 2007 at 04:34:27PM +0200, Andreas Kurz wrote: > On 8/21/07, sebastien lorandel <[EMAIL PROTECTED]> wrote: > > hi, > > > > I descovered that stonith needed some hardware, in the first time I thought > > it was only a piece of software... > > For testing purposes there is also a ssh stonith agent available. > > > But > > 1 - I also read that it is not imperative, so when does it become required? > > I am installing a two node cluster managing ssh, ip address and conntrackd. > > Whenever there is data shared between nodes stonith is highly > recommended to avoid data corruption ... e.g. when two nodes try to > access the same file system.
Right. That's the most important function of STONITH: to reboot a misbehaving node. In some cases, it is possible to do that by utilizing software, say ssh. However, the only method which will guarantee that the host gets rebooted is through a device which is independent of the machine. The best STONITH devices are for example UPS devices which control the power supply to computers. Very popular STONITH devices are "lights out" type (such as IBM RSA or HP ilo): they are independent from the computer they control up to the power supply which they share with their host. Hence, these have a serious problem in case there is no power (for example when the only power supply fails which is not so uncommon). > In case of a split brain situation where > a cluster is split into subclusters with equal node count stonith is a > way to regain quorum by resetting one node. The subcluster who 'wins > the race' to stonith a node in the concurrent subcluster takes over > all resources. A two node heartbeat cluster is a special case because > it has always quorum to allow one single node to run resources. In > case of a cluster with an uneven node count you could also rely on > quorum only but the safest way is stonith. > > In worst case you have two nodes running the same ip/ssh/conntrackd > resources at the same time in your two node cluster if such a split > brain situation occours and you don't have stonith configured. > > > 2 - And I also don't understand why we need hardware, why isn't it directly > > implemented in Heartbeat telling a node it should restart? > > How should heartbeat decide if a node is down or if all communication > paths are unavailble to contact the oder node? If a stonith action is > successfull it is save to decide a node is really dead and you can be > shure it has no resources running. There are e.g. management > facilities with an extra network port available to allow remote > restarts of a server in case the server is completely unresponsive or > you can use manageable UPS facilities. > > > 3 - And then my last question, how can we know if a switch is a STONIH one? > > see above > > Regards, > Andreas > > > > > Thanks in advance, maybe these questions can seem stupid to some of you but > > I didn't saw answer to them in the mailing list and the website. > > > > -- > > Sébastien Lorandel > > _______________________________________________ > > Linux-HA mailing list > > [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
