Re: [Linux-HA] what to do on loss of network

alex Thu, 24 Jan 2008 14:26:37 -0800

Quoting Dejan Muhamedagic <[EMAIL PROTECTED]>:

Hi,


On Fri, Jan 25, 2008 at 09:03:01AM +1300, Steve Wray wrote:

Forgive top posting but I just noted this in some documentation:

"Provided both HA nodes can communicate with each other, ipfail can
reliably detect when one of their network links has become unusable, and
compensate."

In the example which I give this is not the case; the loss of connectivity
is complete. The nodes cannot communicate with one another.


That's called split brain. Not a very nice thing for clusters.
Definitely to be avoided. See http://www.linux-ha.org/SplitBrain

I'm confronted with a similar issue, but for a different reason. Thetwo nodes in our heartbeat/drbd cluster are located in physicallyremote data centers. We have 2 separate GigE interfaces, routedthrough separate switches, to maintain heartbeat communication.

Every few months, our IT department simulates a disaster which wouldmake our primary data center completely unavailable. They cut boththe fiber links between the centers, so the primary data center hasabsolutely no connectivity. This forces all systems to operate fromthe backup center, just to be sure we'll be ready if somethingactually demands it.

We've never had any problems running out of the backup center, exceptthat we're guaranteed to have a split-brain when the 2 centers arereconnected. I would really like to figure out a way to tell the nodein the primary center 'if you can't contact your ping node, don'tbecome primary'.

Is this an unreasonable goal? Is there something I could dodifferently in this situation? The only other option I can think ofis to sit at my desk in the primary center during the test andmanually stop heartbeat just before the test starts.


thanks,
alex
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] what to do on loss of network

Reply via email to