The first test my boss likes to apply to a HA setup is to remove the power
cords from the back of the running primary server.

By having a stonith device (IBM RSA) running from the same power as the host
the failover no longer happens.  :-(

We could power the RSA independently - maybe there is a battery backed power
pack available for it - who know.  Otherwise my boss will pull all three
power cables at the same time - two for the server plus one for the RSA!



--
Alex


-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Andrew Beekhof
Sent: Thursday, 30 October 2008 5:25 PM
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] Stonith, 2 node cluster - on loss
ofpowertoprimarynode; failure to secondary didn't happen.

On Thu, Oct 30, 2008 at 02:01, Andreas Mock <[EMAIL PROTECTED]> wrote:
> Andrew Beekhof schrieb:
>>
>> Not if the power loss includes power loss to the stonith device (which
>> as you said, is what happens in your case).
>> The only real solution is to add a stonith mechanism that doesn't have
>> this design problem (possibly in addition to the existing one).
>>
>> Unfortunately, anything else leaves you as vulnerable as if stonith
>> wasn't enabled in the first place.
>>
>
> Hi Andrew,
>
> you must be more precise: "...as vulnerable to a total power failure of a
> node as if stonith wasn't enabled in the first place."
> You get a reward of enabling this stonith device compared to have no
stonith
> at all, don't you? :-)
> (e.g. software bugs,

yes

> ressource overload,

yes

> network failures of heartbeat-link)

of just the heartbeat link yes...

but a general network failure looks the same as a power outage.
in both cases the other side (including the stonith device) are
unresponsive.

> This special scenario of power outage of one node is IMHO not very likely
in
> a productive HA environment.
> Why? Every node has two power supplies. Every power supply is connected to
> an extra APS which is
> connected to an extra power line. Every power supply is monitored for
> failure to be replaced in time.
> If you can't or don't want to afford this kind of redundancy you have to
> live with a service outage in that
> special scenario.

agreed
mostly i was cautioning against doing something to force the cluster to
continue

because doing that compromises the clusters behavior in other scenarios too

> But to nail it down: Everything is better than mounting a regular
filesystem
> from more than one node!! :-)

well yeah ;-)
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to