Re: [Linux-HA] ipfail not failing over when ping nodes are unpingable

Faisal Shaikh Mon, 23 Apr 2007 10:39:20 -0700

On Fri, 2007-04-20 at 13:55 -0600, Alan Robertson wrote: 
> Faisal Shaikh wrote:
> > Hi all,
> > 
> > Im having trouble with setting up a pair of machines with a single
> > resource (an IP address) to failover between them.
> > The machines are Sun Netras (T1 105) running Gentoo Linux.
> > 
> > The scenario is as follows:
> > fw1: (primary resource holder)
> >     eth0: 192.168.1.52
> >     eth1: 10.0.0.2
> >     
> > fw3: (secondary resource holder)
> >     eth0: 192.168.1.60
> >     eth1: 10.0.0.1
> > 
> > 
> > eth1 is used as a private network between these two machines for the
> > heartbeat. (I havn't got the correct cable type to use the serial
> > connection for the heartbeat.)
> > 
> > 
> > The IP address fails over correctly in the following cases:
> > 
> > 1. When I switch off the primary resource holder.
> > 2. When I stop heartbeat on the primary resource holder.
> > 
> > However, if I disconnect the primary resource holder from the network so
> > that it cant ping the ping nodes, the IP address does not fail over to
> > the secondary resource holder.
> > 
> > After disconnecting the cable, The log entries in the primary resource
> > holder is as follows:
> > 
> > Apr 20 19:47:03 fw1 heartbeat: [4232]: WARN: node 192.168.1.2: is dead
> > Apr 20 19:47:03 fw1 heartbeat: [4232]: WARN: node 192.168.1.3: is dead
> > Apr 20 19:47:03 fw1 heartbeat: [4232]: debug: StartNextRemoteRscReq():
> > child count 1
> > Apr 20 19:47:03 fw1 heartbeat: [4232]: info: Link
> > 192.168.1.2:192.168.1.2 dead.
> > Apr 20 19:47:03 fw1 heartbeat: [4232]: info: Link
> > 192.168.1.3:192.168.1.3 dead.
> > Apr 20 19:47:03 fw1 heartbeat: [4496]: debug: notify_world: setting
> > SIGCHLD Handler to SIG_DFL
> > Apr 20 19:47:03 fw1 harc[4496]: info: Running /etc/ha.d/rc.d/status
> > status
> > Apr 20 19:47:03 fw1 heartbeat: [4512]: debug: notify_world: setting
> > SIGCHLD Handler to SIG_DFL
> > Apr 20 19:47:03 fw1 harc[4512]: info: Running /etc/ha.d/rc.d/status
> > status
> > 
> > 
> > And it stays there doing nothing.
> > 
> > My ha.cf file is as follows:
> > 
> > ucast eth1 10.0.0.1
> > logfile /var/log/ha-log
> > debugfile /var/log/ha-debug
> > keepalive 2
> > warntime 10
> > deadtime 30
> > initdead 120
> > baud 19200
> > udpport 694
> > auto_failback on
> > node fw1
> > node fw3
> > 
> > 
> > respawn hacluster /usr/lib/heartbeat/ipfail
> > ping 192.168.1.2 192.168.1.3
> > crm off
> > 
> > My haresources file is :
> > fw1 192.168.1.100/32/192.168.1.255
> > 
> > I'd appreciate it greatly if someone could point me in the right
> > direction please.
> 
> You need redundant communication for ipfail to work.
> 
> You see, ipfail will only fail over if the two nodes can communicate
> with each other, and agree to move things around.
> 
> What you've done is created a split-brain, where the each node thinks
> the other is dead.  If the other is dead, who can take over?
> 
>


Hi Alan,

Many thanks for your quick reply!

I thought that I did have a redundant communication link. The service IP
was on eth0(192.168.1.0 network) while the heartbeat link was on eth1
(10.0.0.0 network). 

After pulling the network cable on eth0, I could still see the heartbeat
packets and their replies on eth1 using tcpdump. After pulling the
network cable on eth0, if I stop heartbeat on the primary, the secondary
takes over with no problems. This would indicate that the servers are
able to communicate but the handover doesnt occur if a NIC/cable goes
down on the primary.

Im going to try to obtain the a null modem adaptor for a serial cable
that Ill run between the two Netras.


Regards,
Faisal

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] ipfail not failing over when ping nodes are unpingable

Reply via email to