Re: [Linux-HA] ipfail not failing over when ping nodes are unpingable

Alan Robertson Fri, 20 Apr 2007 12:57:21 -0700

Faisal Shaikh wrote:
> Hi all,
> 
> Im having trouble with setting up a pair of machines with a single
> resource (an IP address) to failover between them.
> The machines are Sun Netras (T1 105) running Gentoo Linux.
> 
> The scenario is as follows:
> fw1: (primary resource holder)
>       eth0: 192.168.1.52
>       eth1: 10.0.0.2
>       
> fw3: (secondary resource holder)
>       eth0: 192.168.1.60
>       eth1: 10.0.0.1
> 
> 
> eth1 is used as a private network between these two machines for the
> heartbeat. (I havn't got the correct cable type to use the serial
> connection for the heartbeat.)
> 
> 
> The IP address fails over correctly in the following cases:
> 
> 1. When I switch off the primary resource holder.
> 2. When I stop heartbeat on the primary resource holder.
> 
> However, if I disconnect the primary resource holder from the network so
> that it cant ping the ping nodes, the IP address does not fail over to
> the secondary resource holder.
> 
> After disconnecting the cable, The log entries in the primary resource
> holder is as follows:
> 
> Apr 20 19:47:03 fw1 heartbeat: [4232]: WARN: node 192.168.1.2: is dead
> Apr 20 19:47:03 fw1 heartbeat: [4232]: WARN: node 192.168.1.3: is dead
> Apr 20 19:47:03 fw1 heartbeat: [4232]: debug: StartNextRemoteRscReq():
> child count 1
> Apr 20 19:47:03 fw1 heartbeat: [4232]: info: Link
> 192.168.1.2:192.168.1.2 dead.
> Apr 20 19:47:03 fw1 heartbeat: [4232]: info: Link
> 192.168.1.3:192.168.1.3 dead.
> Apr 20 19:47:03 fw1 heartbeat: [4496]: debug: notify_world: setting
> SIGCHLD Handler to SIG_DFL
> Apr 20 19:47:03 fw1 harc[4496]: info: Running /etc/ha.d/rc.d/status
> status
> Apr 20 19:47:03 fw1 heartbeat: [4512]: debug: notify_world: setting
> SIGCHLD Handler to SIG_DFL
> Apr 20 19:47:03 fw1 harc[4512]: info: Running /etc/ha.d/rc.d/status
> status
> 
> 
> And it stays there doing nothing.
> 
> My ha.cf file is as follows:
> 
> ucast eth1 10.0.0.1
> logfile /var/log/ha-log
> debugfile /var/log/ha-debug
> keepalive 2
> warntime 10
> deadtime 30
> initdead 120
> baud 19200
> udpport 694
> auto_failback on
> node fw1
> node fw3
> 
> 
> respawn hacluster /usr/lib/heartbeat/ipfail
> ping 192.168.1.2 192.168.1.3
> crm off
> 
> My haresources file is :
> fw1 192.168.1.100/32/192.168.1.255
> 
> I'd appreciate it greatly if someone could point me in the right
> direction please.


You need redundant communication for ipfail to work.

You see, ipfail will only fail over if the two nodes can communicate
with each other, and agree to move things around.

What you've done is created a split-brain, where the each node thinks
the other is dead.  If the other is dead, who can take over?


-- 
    Alan Robertson <[EMAIL PROTECTED]>

"Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] ipfail not failing over when ping nodes are unpingable

Reply via email to