Quoting Dejan Muhamedagic <[EMAIL PROTECTED]>:

Hi,

On Fri, Jan 25, 2008 at 09:03:01AM +1300, Steve Wray wrote:
Forgive top posting but I just noted this in some documentation:

"Provided both HA nodes can communicate with each other, ipfail can
reliably detect when one of their network links has become unusable, and
compensate."

In the example which I give this is not the case; the loss of connectivity
is complete. The nodes cannot communicate with one another.

That's called split brain. Not a very nice thing for clusters.
Definitely to be avoided. See http://www.linux-ha.org/SplitBrain

I'm confronted with a similar issue, but for a different reason. The two nodes in our heartbeat/drbd cluster are located in physically remote data centers. We have 2 separate GigE interfaces, routed through separate switches, to maintain heartbeat communication.

Every few months, our IT department simulates a disaster which would make our primary data center completely unavailable. They cut both the fiber links between the centers, so the primary data center has absolutely no connectivity. This forces all systems to operate from the backup center, just to be sure we'll be ready if something actually demands it.

We've never had any problems running out of the backup center, except that we're guaranteed to have a split-brain when the 2 centers are reconnected. I would really like to figure out a way to tell the node in the primary center 'if you can't contact your ping node, don't become primary'.

Is this an unreasonable goal? Is there something I could do differently in this situation? The only other option I can think of is to sit at my desk in the primary center during the test and manually stop heartbeat just before the test starts.

thanks,
alex
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to