Good day all,
Here is my setup: - 2 node cluster - heartbeat v3.0.0-33.2 - RHEL v5.2 - 2 NIC bond (Issue also happens without bond configured) ha.cf: ( cluster node host names and ip address substituted ) logfacility daemon keepalive 1 deadtime 10 deadping 10 warntime 5 initdead 120 udpport 694 bcast bond0 ping 1.1.1.1 auto_failback off node node1 node node2 respawn hacluster /usr/lib64/heartbeat/ipfail use_logd no ha.resources: ( cluster node host names and ip address substituted ) node1 IPaddr::1.1.1.3 My Problem: - Node1 currently owns the ipaddr resource - Node1 get disconnected from network - Node2 starts up resource as expected - Node1 still holds on to ipaddr resource Shouldn't node1 release the resource if the ping node (1.1.1.1) is down? Node1's Log: ( cluster node host names and ip address substituted ) -------------------------------------- Dec 9 18:18:02 node1 ipfail: [17330]: info: Status update: Node 172.20.7.1 now has status dead Dec 9 18:18:02 node1 heartbeat: [17301]: WARN: node node2: is dead Dec 9 18:18:02 node1 heartbeat: [17301]: WARN: No STONITH device configured. Dec 9 18:18:02 node1 heartbeat: [17301]: WARN: Shared disks are not protected. Dec 9 18:18:02 node1 heartbeat: [17301]: info: Resources being acquired from node2. Dec 9 18:18:02 node1 heartbeat: [17301]: info: Link 1.1.1.1:1.1.1.1 dead. Dec 9 18:18:02 node1 heartbeat: [17301]: info: Link node2:bond0 dead. Dec 9 18:18:02 node1 harc[20177]: info: Running /etc/ha.d/rc.d/status status Dec 9 18:18:02 node1 IPaddr[20250]: INFO: Running OK Dec 9 18:18:02 node1 heartbeat: [20178]: info: Local Resource acquisition completed. Dec 9 18:18:02 node1 ipfail: [17330]: info: NS: We are dead. :< Dec 9 18:18:02 node1 ipfail: [17330]: info: Status update: Node node2 now has status dead Dec 9 18:18:02 node1 harc[20279]: info: Running /etc/ha.d/rc.d/status status Dec 9 18:18:02 node1 mach_down[20299]: info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired Dec 9 18:18:02 node1 mach_down[20299]: info: mach_down takeover complete for node node2. Dec 9 18:18:02 node1 heartbeat: [17301]: info: mach_down takeover complete. Dec 9 18:18:03 node1 ipfail: [17330]: info: NS: We are dead. :< Dec 9 18:18:03 node1 ipfail: [17330]: info: Link Status update: Link 1.1.1.1/1.1.1.1 now has status dead Dec 9 18:18:04 node1 ipfail: [17330]: info: We are dead. :< Dec 9 18:18:04 node1 ipfail: [17330]: info: Asking other side for ping node count. Dec 9 18:18:04 node1 ipfail: [17330]: info: Link Status update: Link node2/bond0 now has status dead Dec 9 18:18:05 node1 ipfail: [17330]: info: We are dead. :< Dec 9 18:18:05 node1 ipfail: [17330]: info: Asking other side for ping node count. ----------------------------------------- Node2's Log: ( cluster node host names and ip address substituted ) ---------------------------------------- Dec 9 18:25:54 node2 heartbeat: [17883]: WARN: node node1: is dead Dec 9 18:25:54 node2 ipfail: [17915]: info: Status update: Node node1 now has status dead Dec 9 18:25:54 node2 heartbeat: [17883]: WARN: No STONITH device configured. Dec 9 18:25:54 node2 heartbeat: [17883]: WARN: Shared disks are not protected. Dec 9 18:25:54 node2 heartbeat: [17883]: info: Resources being acquired from node1. Dec 9 18:25:54 node2 heartbeat: [17883]: info: Link node1:bond0 dead. Dec 9 18:25:54 node2 harc[17957]: info: Running /etc/ha.d/rc.d/status status Dec 9 18:25:54 node2 heartbeat: [17958]: info: No local resources [/usr/share/heartbeat/ResourceManager listkeys node2] to acquire. Dec 9 18:25:54 node2 mach_down[17992]: info: Taking over resource group IPaddr::1.1.1.3 Dec 9 18:25:54 node2 ResourceManager[18022]: info: Acquiring resource group: node1 IPaddr::1.1.1.3 Dec 9 18:25:54 node2 IPaddr[18051]: INFO: Resource is stopped Dec 9 18:25:54 node2 ResourceManager[18022]: info: Running /etc/ha.d/resource.d/IPaddr 1.1.1.3 start Dec 9 18:25:54 node2 IPaddr[18114]: INFO: Using calculated nic for 1.1.1.3: bond0 Dec 9 18:25:54 node2 IPaddr[18114]: INFO: Using calculated netmask for 1.1.1.3: 255.255.255.0 Dec 9 18:25:54 node2 IPaddr[18114]: INFO: eval ifconfig bond0:0 1.1.1.3 netmask 255.255.255.0 broadcast 1.1.1.255 Dec 9 18:25:54 node2 IPaddr[18098]: INFO: Success Dec 9 18:25:54 node2 mach_down[17992]: info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired Dec 9 18:25:54 node2 mach_down[17992]: info: mach_down takeover complete for node node1. Dec 9 18:25:54 node2 heartbeat: [17883]: info: mach_down takeover complete. Dec 9 18:25:55 node2 ipfail: [17915]: info: NS: We are still alive! Dec 9 18:25:55 node2 ipfail: [17915]: info: Link Status update: Link node1/bond0 now has status dead Dec 9 18:25:56 node2 ipfail: [17915]: info: Asking other side for ping node count. Dec 9 18:25:56 node2 ipfail: [17915]: info: Checking remote count of ping nodes. Dec 9 18:26:04 node2 IPaddr[18114]: ERROR: Could not send gratuitous arps. rc=1 ------------------------------------------ Am I doing something wrong? Anyone else having this issue? Any help is much appreciated. Thanks -Josh _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
