We are running heartbeat 2.1.3 on CentOS 5.4. Last Monday AM, I received a call while getting ready for work. Our high availability server was not responding. The previous Saturday, our I.T. admins had re-configured the network to expand IP address ranges on some subnets. For whatever reason, this action caused our main server (in a two-node HA configuration) to loose its virtual interface, rendering our high-availability server unavailable.
The network worked fine; the nodes could ping each other based on their normal IP's and they could ping the ping node, but the virtual IP (the one we REALLY care about) was ignored. Nothing in the logs, no errors, nothing. Just an unresponsive virtual server. A manual fail-over brought it back quickly as the backup took over. I.T. had done their work on Sat and, had I checked our server on Sunday, I would have found it "unreachable" with a normal ping. When my colleague called me, I asked him what "ifconfig" looked like. He described three interfaces; eth0, eth1 and lo; no eth0:0. I had him initiate the manual fail-over. After pouring over the logs, unable to find anything that indicated a problem, I tried to simulate the problem with "ifconfig eth0:0 down". Sure enough, no fail-over, no errors, nothing; just (once again) an unresponsive server. "ifconfig eth0:0 <IP_ADDRESS> up" brought it right back (I tried this last Saturday, BTW, when no one was working). It seems that heartbeat (ipfail?) creates this virtual interface when it starts, then forgets about it. I presume that the assumption is that if eth0 remains intact, eth0:0 will remain intact, as well. Am I missing something in the configuration settings or docs? I find nothing about configuring the backup node to monitor the virtual address, just the other node (which has a different IP and kept working after the network changes). I am about to set up a service to monitor the virtual IP, but I wanted to check with the list, first, to see if there's already been something built in that I have not configured correctly. I have used main.company.com and backup.company.com as the two hostnames of the nodes. Both systems have these names in an /etc/hosts file, along with the hostname and IP of the virtual server and the ping node. My configuration: /etc/ha.d/ha.cf: debugfile /var/log/ha-debug logfile /var/log/ha-log logfacility local0 keepalive 2 deadtime 10 warntime 3 initdead 120 udpport 694 baud 9600 serial /dev/ttyS0 ucast eth1 10.0.0.1 ucast eth1 10.0.0.2 auto_failback off node main.company.com backup.company.com ping 129.196.140.130 respawn hacluster /usr/lib/heartbeat/ipfail deadping 10 /etc/ha.d/haresources main.company.com drbddisk::drbd_resource_0 Filesystem::/dev/drbd0::/usr0::ext3 mysql IPaddr::129.196.140.14 httpd smb MailTo::root _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
