The first test my boss likes to apply to a HA setup is to remove the power cords from the back of the running primary server.
By having a stonith device (IBM RSA) running from the same power as the host the failover no longer happens. :-( We could power the RSA independently - maybe there is a battery backed power pack available for it - who know. Otherwise my boss will pull all three power cables at the same time - two for the server plus one for the RSA! -- Alex -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andrew Beekhof Sent: Thursday, 30 October 2008 5:25 PM To: General Linux-HA mailing list Subject: Re: [Linux-HA] Stonith, 2 node cluster - on loss ofpowertoprimarynode; failure to secondary didn't happen. On Thu, Oct 30, 2008 at 02:01, Andreas Mock <[EMAIL PROTECTED]> wrote: > Andrew Beekhof schrieb: >> >> Not if the power loss includes power loss to the stonith device (which >> as you said, is what happens in your case). >> The only real solution is to add a stonith mechanism that doesn't have >> this design problem (possibly in addition to the existing one). >> >> Unfortunately, anything else leaves you as vulnerable as if stonith >> wasn't enabled in the first place. >> > > Hi Andrew, > > you must be more precise: "...as vulnerable to a total power failure of a > node as if stonith wasn't enabled in the first place." > You get a reward of enabling this stonith device compared to have no stonith > at all, don't you? :-) > (e.g. software bugs, yes > ressource overload, yes > network failures of heartbeat-link) of just the heartbeat link yes... but a general network failure looks the same as a power outage. in both cases the other side (including the stonith device) are unresponsive. > This special scenario of power outage of one node is IMHO not very likely in > a productive HA environment. > Why? Every node has two power supplies. Every power supply is connected to > an extra APS which is > connected to an extra power line. Every power supply is monitored for > failure to be replaced in time. > If you can't or don't want to afford this kind of redundancy you have to > live with a service outage in that > special scenario. agreed mostly i was cautioning against doing something to force the cluster to continue because doing that compromises the clusters behavior in other scenarios too > But to nail it down: Everything is better than mounting a regular filesystem > from more than one node!! :-) well yeah ;-) _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
