We have an old installation of heartbeat running on SuSE 9.0. heartbeat version is 1.2.3.
Normal start is OK. But we tested a takeover. We shut down the active node. Then the application was started twice on the second node. In messages I found this: Nov 28 14:22:37 lechz1 ipfail[1456]: debug: Other side is unstable. Nov 28 14:22:39 lechz1 heartbeat[1424]: info: Received shutdown notice from 'lechz2'. Nov 28 14:22:39 lechz1 heartbeat[1424]: info: Resources being acquired from lechz2. Nov 28 14:22:39 lechz1 heartbeat[1424]: debug: StartNextRemoteRscReq(): child count 1 Nov 28 14:22:39 lechz1 heartbeat[1460]: info: acquire all HA resources (standby). Nov 28 14:22:40 lechz1 heartbeat: info: Acquiring resource group: lechz1 192.168.7.199 telematx.start.communication Nov 28 14:22:40 lechz1 heartbeat[1424]: debug: StartNextRemoteRscReq(): child count 2 Nov 28 14:22:40 lechz1 heartbeat[1461]: info: Local Resource acquisition completed. Nov 28 14:22:40 lechz1 heartbeat[1424]: debug: StartNextRemoteRscReq(): child count 1 Nov 28 14:22:40 lechz1 heartbeat: info: Running /etc/ha.d/resource.d/IPaddr 192.168.7.199 start Nov 28 14:22:40 lechz1 heartbeat: debug: Starting /etc/ha.d/resource.d/IPaddr 192.168.7.199 start Nov 28 14:22:40 lechz1 heartbeat: info: /home/lzgneu/bin/ifconfig eth0:0 192.168.7.199 netmask 255.255.255.0 broadcast 192.168.7.255 Nov 28 14:22:40 lechz1 heartbeat: info: Sending Gratuitous Arp for 192.168.7.199 on eth0:0 [eth0] Nov 28 14:22:40 lechz1 heartbeat: /usr/lib/heartbeat/send_arp -i 1010 -r 5 -p /var/lib/heartbeat/rsctmp/send_arp/send_arp-192.168.7.199 eth0 192.168.7.199 auto 192.168.7.199 ffffffffffff Nov 28 14:22:40 lechz1 heartbeat: debug: /etc/ha.d/resource.d/IPaddr 192.168.7.199 start done. RC=0 Nov 28 14:22:40 lechz1 kernel: send_arp uses obsolete (PF_INET,SOCK_PACKET) Nov 28 14:22:40 lechz1 heartbeat: info: Running /etc/ha.d/resource.d/telematx.start.communication start Nov 28 14:22:40 lechz1 heartbeat: debug: Starting /etc/ha.d/resource.d/telematx.start.communication start Nov 28 14:22:40 lechz1 heartbeat: debug: /etc/ha.d/resource.d/telematx.start.communication start done. RC=0 Nov 28 14:22:40 lechz1 heartbeat[1460]: info: all HA resource acquisition completed (standby). Nov 28 14:22:40 lechz1 heartbeat[1424]: info: Standby resource acquisition done [all]. Nov 28 14:22:40 lechz1 heartbeat[1659]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL Nov 28 14:22:40 lechz1 heartbeat: info: Running /etc/ha.d/rc.d/status status Nov 28 14:22:40 lechz1 heartbeat: info: /usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired Nov 28 14:22:40 lechz1 heartbeat[1424]: info: mach_down takeover complete. Nov 28 14:22:40 lechz1 heartbeat: info: mach_down takeover complete for node lechz2. Nov 28 14:22:40 lechz1 heartbeat[1682]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL Nov 28 14:22:40 lechz1 heartbeat: info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp Nov 28 14:22:40 lechz1 heartbeat: received ip-request-resp 192.168.7.199 OK yes Nov 28 14:22:40 lechz1 heartbeat: info: Acquiring resource group: lechz1 192.168.7.199 telematx.start.communication Nov 28 14:22:40 lechz1 su: (to lzgneu) root on none Nov 28 14:22:40 lechz1 su: pam_unix2: session started for user lzgneu, service su Nov 28 14:22:40 lechz1 heartbeat: info: Running /etc/ha.d/resource.d/telematx.start.communication start Nov 28 14:22:40 lechz1 heartbeat: debug: Starting /etc/ha.d/resource.d/telematx.start.communication start Nov 28 14:22:40 lechz1 su: (to lzgneu) root on none Nov 28 14:22:40 lechz1 su: pam_unix2: session started for user lzgneu, service su Nov 28 14:22:40 lechz1 heartbeat: debug: /etc/ha.d/resource.d/telematx.start.communication start done. RC=0 As you can see, telematx.start.communication starts twice in the same second. Where should I look for a configuration error? Regards, Burkhard _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
