On Fri, Aug 12, 2011 at 08:58:05AM +1000, Andrew Beekhof wrote: > On Thu, Aug 11, 2011 at 9:29 PM, Sam Sun <[email protected]> wrote: > > Hi All, > > This is Sam for Ericsson IPWorks product maintenance team. We have an > > urgent problem on the Linux HA solution. > > I am not sure if this is the right mail box, however it is very appreciated > > if any one can help us. > > Our product has used SLES 10 SP4 X86_64 with HA version 2.1.4-0.24.9. > > I'd contact SUSE - you pay them to give you their full attention :-) > > > We have a problem in the STONITH implement. There are only two nodes in HA > > cluster. > > However if there is split brain situation, Two HA nodes will shutdown > > the peer nodes both at the same time? > > Yes > > > Then we only let STONTH running in one of HA nodes, is this a right > > configuration? > > No. > > > Is there any Best Practice for STONITH implementation in HA which only has > > two nodes?
I assume you are already aware of http://ourobengr.com/ha Besides that, you may want to add a random (or node dependent) timeout to the stonith agent action, to increase the chance during a split brain that one shoots the other before being shot itself. So e.g. you have nodes A and B, and you modify the stonith agent to always sleep(x) on node A when shooting node B, but to not do any sleep on node B when shooting node A. If it is an actual node crash, worst case you need x more seconds for the stonith action. If it was a split brain, both nodes still alive, chances are that only A will be shot. Typically the DC before the split brain will have a slight advantage anyways, so simultaneously "successfully" shooting each other should not be that common. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
