Re: [Linux-HA] what to do on loss of network

Steve Wray Thu, 24 Jan 2008 13:31:43 -0800

Dejan Muhamedagic wrote:

Hi,
On Fri, Jan 25, 2008 at 09:03:01AM +1300, Steve Wray wrote:
Forgive top posting but I just noted this in some documentation:
"Provided both HA nodes can communicate with each other, ipfail canreliably detect when one of their network links has become unusable, andcompensate."
In the example which I give this is not the case; the loss of connectivityis complete. The nodes cannot communicate with one another.
That's called split brain. Not a very nice thing for clusters.
Definitely to be avoided. See http://www.linux-ha.org/SplitBrain
One of the nodes can still contact its 'ping' node but not the other nodein the cluster. It is still on the network and can still provide NFSservice.
The other node cannot contact its 'ping' node and also cannot contact theother node in the cluster. It is not on the network at all. It has a deadnetwork connection.
I need for the node with *zero* connectivity to *not* take over as theactive node as this makes no sense at all; its not on the network, it ispointless bringing up NFS. It should just sit and wait for connectivity tobe restored and do nothing but monitor the state of its network connection.
Neither node knows what is happening on the other side. So, they
both consider that the other node is dead (that's a two-node
cluster quorum policy which could be ensured to be sane if you
had stonith configured)

I don't want the other node 'shot in the head' though. Network failurecould be transitory and when it comes back I don't want to have tomanually restart the 'shot' server.

and try to acquire resources. ipfail
could ask the node to relinquish all resources in case of no
connectivity, but it doesn't probably because nobody ever needed
such a thing.

In this instance the two servers are on a test bed, both running on thesame Xen host and connected via the Xen bridge.

On the production server there are two physical hosts and these areconnected via a crossover cable. Each duplicate of the test bed virtualmachines run on each of the two physical hosts.


Theres also no way to use a serial connection in this setup.

The only possible channel of communication between the two nodes is viaa single network connection.


This network connection could fail on one or both nodes.

If it fails then any node which cannot reach the network should just'sit down and shut up' until network is restored (not 'turn off' whichis what I read as implied by stonith).

Can stonith be used to induce a transient shutdown? Eg turn offheartbeat and wait for network to come back, at which time turnheartbeat back on.

Thatnks,

Dejan
Steve Wray wrote:
Dejan Muhamedagic wrote:
Hi,

On Thu, Jan 24, 2008 at 09:39:05AM +1300, Steve Wray wrote:
Well I posted my config and I've tried various things and tested thissetup... and it still behaves incorrectly: going primary in the event ofa complete loss of network connectivity.
I mean... its an NFS server... *network* filesystem. If it can't connectto the network *at* *all* it makes no sense to become the primary NFSserver...
I'd really appreciate some comment on what may be wrong in the configfiles that I've posted. If theres any further info that I need to postplease mention it.
Did you check if ipfail is running? If not, then you have to
check the user in the respawn line. Otherwise, please post the
logs.
Thanks for your reply!
ipfail is running, the user in the respawn line is correct.
I just ran a test failure of the network interface in the non-primarynode. Here are the logs from this test run only from the 'failed' node.ipfail determines that "We are dead" and then heartbeat decides to takeover as primary.
Could this be a problem with "/etc/ha.d/rc.d/status status"?
------------------------------------------------------------------------
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] what to do on loss of network

Reply via email to