On 11/7/2009 at 06:06 AM, c smith <[email protected]> wrote: 
> On Thu, Nov 5, 2009 at 10:30 PM, Tim Serong <[email protected]> wrote: 
> > 
> > That's what'll happen.  Absent evidence of life from either node, the only 
> > safe thing to do is try to kill it.  With only two nodes, you can't assume 
> > anything else, as there's no clear majority.  By comparison, if there were 
> > three nodes, and one node couldn't see the others, it could safely assume 
> > that *it* was faulty. 
> > 
> > HTH, 
> > 
> > Tim 
> > 
> > 
> Hi Tim- 
>  
> Thanks much for the quick response.   One more scenario/question.. I've got 
> a 2 node cluster with 3 NICs each, 2 of which are used for direct crossover 
> cluster communication and the other goes to the switched network+stonith 
> device.  If those 2 cross-connections are degraded such that cluster 
> communication ceases, each node will send a STONITH request to the device 
> for its peer, correct?

Yes.

> In the event that both requests make it to the 
> STONITh device, both nodes would be shot?

Yes.  Speaking of which, you might be mildly interested in reading:

  http://ourobengr.com/ha

> Is this a design flaw on my part?  Should all 3 interfaces be used for
> cluster communication? 

Depends how paranoid you are :)  On the general principle that one tries to 
avoid single points of failure, you've already achieved this for cluster 
communication by having two network links.

That being said, questions to consider include:

- If one link fails, can you get out there and fix it before the second
  fails, and STONITH ensures?
- What is the chance of both network links failing simultaneously?
  (possibly greater for, say, a dual port NIC vs. separate single-port
  NICs...  Or two NICs on the same bus, vs. different busses)
- If two links failed, what's the likelihood the third would also fail
  at the same time (somewhere, there is a point of diminishing returns)?

It's also worth thinking about the single connection to the STONITH device, 
which could also fail.  This won't necessarily be catastrophic (one node won't 
take over the other's resources unless STONITH succeeds, so there shouldn't be 
any problem with data corruption) but it does mean that failover won't occur 
without manual intervention if the STONITH device is inaccessible.

Regards,

Tim


-- 
Tim Serong <[email protected]>
Senior Clustering Engineer, Novell Inc.



_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to