On Fri, 2007-04-20 at 13:55 -0600, Alan Robertson wrote: > Faisal Shaikh wrote: > > Hi all, > > > > Im having trouble with setting up a pair of machines with a single > > resource (an IP address) to failover between them. > > The machines are Sun Netras (T1 105) running Gentoo Linux. > > > > The scenario is as follows: > > fw1: (primary resource holder) > > eth0: 192.168.1.52 > > eth1: 10.0.0.2 > > > > fw3: (secondary resource holder) > > eth0: 192.168.1.60 > > eth1: 10.0.0.1 > > > > > > eth1 is used as a private network between these two machines for the > > heartbeat. (I havn't got the correct cable type to use the serial > > connection for the heartbeat.) > > > > > > The IP address fails over correctly in the following cases: > > > > 1. When I switch off the primary resource holder. > > 2. When I stop heartbeat on the primary resource holder. > > > > However, if I disconnect the primary resource holder from the network so > > that it cant ping the ping nodes, the IP address does not fail over to > > the secondary resource holder. > > > > After disconnecting the cable, The log entries in the primary resource > > holder is as follows: > > > > Apr 20 19:47:03 fw1 heartbeat: [4232]: WARN: node 192.168.1.2: is dead > > Apr 20 19:47:03 fw1 heartbeat: [4232]: WARN: node 192.168.1.3: is dead > > Apr 20 19:47:03 fw1 heartbeat: [4232]: debug: StartNextRemoteRscReq(): > > child count 1 > > Apr 20 19:47:03 fw1 heartbeat: [4232]: info: Link > > 192.168.1.2:192.168.1.2 dead. > > Apr 20 19:47:03 fw1 heartbeat: [4232]: info: Link > > 192.168.1.3:192.168.1.3 dead. > > Apr 20 19:47:03 fw1 heartbeat: [4496]: debug: notify_world: setting > > SIGCHLD Handler to SIG_DFL > > Apr 20 19:47:03 fw1 harc[4496]: info: Running /etc/ha.d/rc.d/status > > status > > Apr 20 19:47:03 fw1 heartbeat: [4512]: debug: notify_world: setting > > SIGCHLD Handler to SIG_DFL > > Apr 20 19:47:03 fw1 harc[4512]: info: Running /etc/ha.d/rc.d/status > > status > > > > > > And it stays there doing nothing. > > > > My ha.cf file is as follows: > > > > ucast eth1 10.0.0.1 > > logfile /var/log/ha-log > > debugfile /var/log/ha-debug > > keepalive 2 > > warntime 10 > > deadtime 30 > > initdead 120 > > baud 19200 > > udpport 694 > > auto_failback on > > node fw1 > > node fw3 > > > > > > respawn hacluster /usr/lib/heartbeat/ipfail > > ping 192.168.1.2 192.168.1.3 > > crm off > > > > My haresources file is : > > fw1 192.168.1.100/32/192.168.1.255 > > > > I'd appreciate it greatly if someone could point me in the right > > direction please. > > You need redundant communication for ipfail to work. > > You see, ipfail will only fail over if the two nodes can communicate > with each other, and agree to move things around. > > What you've done is created a split-brain, where the each node thinks > the other is dead. If the other is dead, who can take over? > >
Hi Alan, Many thanks for your quick reply! I thought that I did have a redundant communication link. The service IP was on eth0(192.168.1.0 network) while the heartbeat link was on eth1 (10.0.0.0 network). After pulling the network cable on eth0, I could still see the heartbeat packets and their replies on eth1 using tcpdump. After pulling the network cable on eth0, if I stop heartbeat on the primary, the secondary takes over with no problems. This would indicate that the servers are able to communicate but the handover doesnt occur if a NIC/cable goes down on the primary. Im going to try to obtain the a null modem adaptor for a serial cable that Ill run between the two Netras. Regards, Faisal _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
