Hi, On Fri, Jan 25, 2008 at 09:03:01AM +1300, Steve Wray wrote: > Forgive top posting but I just noted this in some documentation: > > "Provided both HA nodes can communicate with each other, ipfail can > reliably detect when one of their network links has become unusable, and > compensate." > > In the example which I give this is not the case; the loss of connectivity > is complete. The nodes cannot communicate with one another.
That's called split brain. Not a very nice thing for clusters. Definitely to be avoided. See http://www.linux-ha.org/SplitBrain > One of the nodes can still contact its 'ping' node but not the other node > in the cluster. It is still on the network and can still provide NFS > service. > > The other node cannot contact its 'ping' node and also cannot contact the > other node in the cluster. It is not on the network at all. It has a dead > network connection. > > I need for the node with *zero* connectivity to *not* take over as the > active node as this makes no sense at all; its not on the network, it is > pointless bringing up NFS. It should just sit and wait for connectivity to > be restored and do nothing but monitor the state of its network connection. Neither node knows what is happening on the other side. So, they both consider that the other node is dead (that's a two-node cluster quorum policy which could be ensured to be sane if you had stonith configured) and try to acquire resources. ipfail could ask the node to relinquish all resources in case of no connectivity, but it doesn't probably because nobody ever needed such a thing. Thatnks, Dejan > > Steve Wray wrote: >> Dejan Muhamedagic wrote: >>> Hi, >>> >>> On Thu, Jan 24, 2008 at 09:39:05AM +1300, Steve Wray wrote: >>>> Well I posted my config and I've tried various things and tested this >>>> setup... and it still behaves incorrectly: going primary in the event of >>>> a complete loss of network connectivity. >>>> >>>> I mean... its an NFS server... *network* filesystem. If it can't connect >>>> to the network *at* *all* it makes no sense to become the primary NFS >>>> server... >>>> >>>> I'd really appreciate some comment on what may be wrong in the config >>>> files that I've posted. If theres any further info that I need to post >>>> please mention it. >>> >>> Did you check if ipfail is running? If not, then you have to >>> check the user in the respawn line. Otherwise, please post the >>> logs. >> Thanks for your reply! >> ipfail is running, the user in the respawn line is correct. >> I just ran a test failure of the network interface in the non-primary >> node. Here are the logs from this test run only from the 'failed' node. >> ipfail determines that "We are dead" and then heartbeat decides to take >> over as primary. >> Could this be a problem with "/etc/ha.d/rc.d/status status"? >> ------------------------------------------------------------------------ >> _______________________________________________ >> Linux-HA mailing list >> [email protected] >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
