Re: [Linux-HA] stonith failed to start

Terry L. Inzauro Thu, 20 Aug 2009 07:56:25 -0700

Dejan Muhamedagic wrote:
> Hi,
> 
> On Thu, Aug 20, 2009 at 03:58:19PM +0200, Andrew Beekhof wrote:
>> On Thu, Aug 20, 2009 at 3:56 PM, Terry L.
>> Inzauro<[email protected]> wrote:
>>
>>> Ok. I am indeed using 'external/ssh' as the stonith device.   I figure it 
>>> was better than nothing as I do not have access to
>>> a hardware stonith device.  In you opinion, is using the 'external/ssh'  
>>> plugin 'better' than NOT using a stonith plugin at all?
>> personally, i think so.
>> but there are plenty that disagree.
> 
> Ah, that would include me :)
> 
> If the stonith device fails to fence the failing node then there
> is no failover and you get zero availability. The probability
> that that happens is much higher when using a device such as
> external/ssh since it depends on both the network availability
> and the OS health. I'll leave it to you to figure out in how many
> ways these two dependencies can hinder a fencing operation.
> 
> Thanks,
> 
> Dejan
> 
> 
>



Ahem.

How many ways to hinder, let me count the ways.  Glad I got that out of my 
system.  Now on to the business at hand.

--------------------

There may be many different failures, but I guess I would have to spit them 
into two groups: probably and improbable.

Probable list:
1. Physical network link failure
2. Ethernet switch fabric failure
3. administrator error (accidentally breaking network configurations including 
sshd breakage)

Improbable list:
1. IP stack failure
2. Unexpected OS errors (linux is pretty stable these days)
3. Ethernet adapter failure (i cant remember the last time i saw an Ethernet 
card fail)


Having said all that, one can derive a thought that assumes 99% of the failures 
are related to 'external/ssh' stonith device
are network related.  So,   my last question is:

Can the 'external/ssh' stonith plugin be configured to be "network fault 
tolerant".  For instance:

   <nvpair id="stonithclone-attr-1" name="hostlist" value="node1 node1-c node2 
node2-c"/>

where:
node1 = communications over eth0 and switch0
node1-c = communications over eth1 via xover
node2 = communications over eth0 and switch0
node2-c = communications over eth1 via xover

the desired logic is this:

if node1 communication to node2 fails
then
use node1-c communications to node2-c
else
stonith thy self


i would say the probability of both links failing is slim.  this setup would 
then alleviate the "probable" list. right?






_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] stonith failed to start

Reply via email to