Quoting Dejan Muhamedagic <[EMAIL PROTECTED]>:
Hi,
On Fri, Jan 25, 2008 at 09:03:01AM +1300, Steve Wray wrote:
Forgive top posting but I just noted this in some documentation:
"Provided both HA nodes can communicate with each other, ipfail can
reliably detect when one of their network links has become unusable, and
compensate."
In the example which I give this is not the case; the loss of connectivity
is complete. The nodes cannot communicate with one another.
That's called split brain. Not a very nice thing for clusters.
Definitely to be avoided. See http://www.linux-ha.org/SplitBrain
I'm confronted with a similar issue, but for a different reason. The
two nodes in our heartbeat/drbd cluster are located in physically
remote data centers. We have 2 separate GigE interfaces, routed
through separate switches, to maintain heartbeat communication.
Every few months, our IT department simulates a disaster which would
make our primary data center completely unavailable. They cut both
the fiber links between the centers, so the primary data center has
absolutely no connectivity. This forces all systems to operate from
the backup center, just to be sure we'll be ready if something
actually demands it.
We've never had any problems running out of the backup center, except
that we're guaranteed to have a split-brain when the 2 centers are
reconnected. I would really like to figure out a way to tell the node
in the primary center 'if you can't contact your ping node, don't
become primary'.
Is this an unreasonable goal? Is there something I could do
differently in this situation? The only other option I can think of
is to sit at my desk in the primary center during the test and
manually stop heartbeat just before the test starts.
thanks,
alex
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems