I have a two server Active / Passive cluster. It's running on RHEL 4WS, Installed from RPM's. I see that st002 sees the first server as down, but I'm not sure why. Both servers have iptables shutdown. Chad The RPM's are: heartbeat-stonith-2.0.7-1.c4 heartbeat-ldirectord-2.0.7-1.c4 heartbeat-pils-2.0.7-1.c4 heartbeat-2.0.7-1.c4
ha.cf: logfacility local0 keepalive 1 deadtime 10 warntime 5 initdead 20 udpport 694 mcast eth0 225.0.0.1 694 1 0 auto_failback off node st001 node st002 ping 10.10.10.1 respawn hacluster /usr/lib/heartbeat/ipfail crm no haresources: st001 IPaddr2::10.10.10.9/24/eth0/10.10.10.255 Jul 12 18:24:16 st001 logd: [17111]: info: logd started with default configuration. Jul 12 18:24:16 st001 logd: [17112]: info: G_main_add_SignalHandler: Added signal handler for signal 15 Jul 12 18:24:16 st001 logd: [17111]: info: G_main_add_SignalHandler: Added signal handler for signal 15 Jul 12 18:24:16 st001 heartbeat: [17263]: WARN: Logging daemon is disabled --enabling logging daemon is recommended Jul 12 18:24:16 st001 heartbeat: [17263]: info: ************************** Jul 12 18:24:16 st001 heartbeat: [17263]: info: Configuration validated. Starting heartbeat 2.0.7 Jul 12 18:24:16 st001 heartbeat: [17264]: info: heartbeat: version 2.0.7 Jul 12 18:24:17 st001 heartbeat: [17264]: info: Heartbeat generation: 14 Jul 12 18:24:17 st001 heartbeat: [17264]: info: G_main_add_TriggerHandler: Added signal manual handler Jul 12 18:24:17 st001 heartbeat: [17264]: info: G_main_add_TriggerHandler: Added signal manual handler Jul 12 18:24:17 st001 heartbeat: [17264]: info: Removing /var/run/heartbeat/rsctmp failed, recreating. Jul 12 18:24:17 st001 heartbeat: [17264]: info: glib: UDP multicast heartbeat started for group 225.0.0.1 port 694 interface eth0 (ttl=1 loop=0) Jul 12 18:24:17 st001 heartbeat: [17264]: info: glib: ping heartbeat started. Jul 12 18:24:17 st001 heartbeat: [17264]: info: G_main_add_SignalHandler: Added signal handler for signal 17 Jul 12 18:24:17 st001 heartbeat: [17264]: info: Local status now set to: 'up' Jul 12 18:24:18 st001 heartbeat: [17264]: info: Link 10.10.10.1:10.10.10.1 up. Jul 12 18:24:18 st001 heartbeat: [17264]: info: Status update for node 10.10.10.1: status ping Jul 12 18:24:37 st001 heartbeat: [17264]: WARN: node st002: is dead Jul 12 18:24:37 st001 heartbeat: [17264]: info: Comm_now_up(): updating status to active Jul 12 18:24:37 st001 heartbeat: [17264]: info: Local status now set to: 'active' Jul 12 18:24:37 st001 heartbeat: [17264]: info: Starting child client "/usr/lib/heartbeat/ipfail" (501,501) Jul 12 18:24:37 st001 heartbeat: [17264]: WARN: No STONITH device configured. Jul 12 18:24:37 st001 heartbeat: [17264]: WARN: Shared disks are not protected. Jul 12 18:24:37 st001 heartbeat: [17264]: info: Resources being acquired from st002. Jul 12 18:24:37 st001 heartbeat: [17273]: info: Starting "/usr/lib/heartbeat/ipfail" as uid 501 gid 501 (pid 17273) Jul 12 18:24:37 st001 harc[17274]: info: Running /etc/ha.d/rc.d/status status Jul 12 18:24:37 st001 mach_down[17295]: info: /usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired Jul 12 18:24:37 st001 mach_down[17295]: info: mach_down takeover complete for node st002. Jul 12 18:24:37 st001 heartbeat: [17264]: info: Initial resource acquisition complete (T_RESOURCES(us)) Jul 12 18:24:37 st001 heartbeat: [17264]: info: mach_down takeover complete. Jul 12 18:24:37 st001 IPaddr2[17322]: INFO: IPaddr2 Resource is stopped Jul 12 18:24:37 st001 heartbeat: [17264]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 20 ms (> 10 ms) (GSource: 0x9fce078) Jul 12 18:24:37 st001 heartbeat: [17275]: info: Local Resource acquisition completed. Jul 12 18:24:37 st001 harc[17451]: info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp Jul 12 18:24:38 st001 ip-request-resp[17451]: received ip-request-resp IPaddr2::10.10.10.9/24/eth0/10.10.10.255 OK yes Jul 12 18:24:38 st001 ResourceManager[17466]: info: Acquiring resource group: st001 IPaddr2::10.10.10.9/24/eth0/10.10.10.255 Jul 12 18:24:38 st001 IPaddr2[17490]: INFO: IPaddr2 Resource is stopped Jul 12 18:24:38 st001 ResourceManager[17466]: info: Running /etc/ha.d/resource.d/IPaddr2 10.10.10.9/24/eth0/10.10.10.255 start Jul 12 18:24:38 st001 IPaddr2[17704]: INFO: /sbin/ip -f inet addr add 10.10.10.9/24 brd 10.10.10.255 dev eth0 Jul 12 18:24:38 st001 IPaddr2[17704]: INFO: /sbin/ip link set eth0 up Jul 12 18:24:38 st001 IPaddr2[17704]: INFO: /usr/lib/heartbeat/send_arp -i 200 -r 5 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-10.10.10.9 eth0 10.10.10.9 auto 10.10.10.9 ffffffffffff Jul 12 18:24:38 st001 IPaddr2[17622]: INFO: IPaddr2 Success Jul 12 18:24:48 st001 heartbeat: [17264]: info: Local Resource acquisition completed. (none) Jul 12 18:24:48 st001 heartbeat: [17264]: info: local resource transition completed. When I start st002 I get: Jul 12 18:25:42 st001 heartbeat: [17264]: info: Link st002:eth0 up. Jul 12 18:25:42 st001 heartbeat: [17264]: info: Status update for node st002: status init Jul 12 18:25:42 st001 ipfail: [17273]: info: Link Status update: Link st002/eth0 now has status up Jul 12 18:25:42 st001 ipfail: [17273]: info: Status update: Node st002 now has status init Jul 12 18:25:42 st001 heartbeat: [17264]: info: Status update for node st002: status up Jul 12 18:25:42 st001 ipfail: [17273]: info: Status update: Node st002 now has status up Jul 12 18:25:42 st001 harc[17757]: info: Running /etc/ha.d/rc.d/status status Jul 12 18:25:42 st001 harc[17768]: info: Running /etc/ha.d/rc.d/status status Jul 12 18:25:44 st001 heartbeat: [17264]: info: all clients are now paused Jul 12 18:26:01 st001 heartbeat: [17264]: WARN: 1 lost packet(s) for [st002] [24:26] Jul 12 18:26:01 st001 heartbeat: [17264]: info: Status update for node st002: status active Jul 12 18:26:01 st001 ipfail: [17273]: info: Status update: Node st002 now has status active Jul 12 18:26:01 st001 heartbeat: [17264]: info: No pkts missing from st002! Jul 12 18:26:01 st001 heartbeat: [17264]: info: remote resource transition completed. Jul 12 18:26:02 st001 harc[17778]: info: Running /etc/ha.d/rc.d/status status Jul 12 18:26:02 st001 heartbeat: [17264]: ERROR: Both machines own our resources! Jul 12 18:26:02 st001 heartbeat: [17264]: ERROR: Both machines own foreign resources! Jul 12 18:26:02 st001 heartbeat: [17264]: ERROR: Both machines own our resources! Jul 12 18:26:02 st001 heartbeat: [17264]: ERROR: Both machines own foreign resources! Jul 12 18:26:02 st001 heartbeat: [17264]: WARN: G_CH_dispatch_int: Dispatch function for read child took too long to execute: 90 ms (> 50 ms) (GSource: 0x9fd4530) Jul 12 18:26:02 st001 heartbeat: [17264]: ERROR: Both machines own our resources! Jul 12 18:26:02 st001 heartbeat: [17264]: ERROR: Both machines own foreign resources! Jul 12 18:26:12 st001 heartbeat: [17264]: ERROR: Both machines own our resources! Jul 12 18:26:12 st001 heartbeat: [17264]: ERROR: Both machines own foreign resources! Server number two: Jul 12 18:25:40 st002 logd: [9578]: info: logd started with default configuration. Jul 12 18:25:40 st002 logd: [9579]: info: G_main_add_SignalHandler: Added signal handler for signal 15 Jul 12 18:25:40 st002 logd: [9578]: info: G_main_add_SignalHandler: Added signal handler for signal 15 Jul 12 18:25:40 st002 heartbeat: [9730]: WARN: Logging daemon is disabled --enabling logging daemon is recommended Jul 12 18:25:40 st002 heartbeat: [9730]: info: ************************** Jul 12 18:25:40 st002 heartbeat: [9730]: info: Configuration validated. Starting heartbeat 2.0.7 Jul 12 18:25:40 st002 heartbeat: [9731]: info: heartbeat: version 2.0.7 Jul 12 18:25:41 st002 heartbeat: [9731]: info: Heartbeat generation: 9 Jul 12 18:25:41 st002 heartbeat: [9731]: info: G_main_add_TriggerHandler: Added signal manual handler Jul 12 18:25:41 st002 heartbeat: [9731]: info: G_main_add_TriggerHandler: Added signal manual handler Jul 12 18:25:41 st002 heartbeat: [9731]: info: Removing /var/run/heartbeat/rsctmp failed, recreating. Jul 12 18:25:41 st002 heartbeat: [9731]: info: glib: UDP multicast heartbeat started for group 225.0.0.1 port 694 interface eth0 (ttl=1 loop=0) Jul 12 18:25:41 st002 heartbeat: [9731]: info: glib: ping heartbeat started. Jul 12 18:25:41 st002 heartbeat: [9731]: info: G_main_add_SignalHandler: Added signal handler for signal 17 Jul 12 18:25:41 st002 heartbeat: [9731]: info: Local status now set to: 'up' Jul 12 18:25:42 st002 heartbeat: [9731]: info: Link 10.10.10.1:10.10.10.1 up. Jul 12 18:25:42 st002 heartbeat: [9731]: info: Status update for node 10.10.10.1: status ping Jul 12 18:26:01 st002 heartbeat: [9731]: WARN: node st001: is dead Jul 12 18:26:01 st002 heartbeat: [9731]: info: Comm_now_up(): updating status to active Jul 12 18:26:01 st002 heartbeat: [9731]: info: Local status now set to: 'active' Jul 12 18:26:01 st002 heartbeat: [9731]: info: Starting child client "/usr/lib/heartbeat/ipfail" (501,501) Jul 12 18:26:01 st002 heartbeat: [9731]: WARN: No STONITH device configured. Jul 12 18:26:01 st002 heartbeat: [9731]: WARN: Shared disks are not protected. Jul 12 18:26:01 st002 heartbeat: [9731]: info: Resources being acquired from st001. Jul 12 18:26:01 st002 heartbeat: [9740]: info: Starting "/usr/lib/heartbeat/ipfail" as uid 501 gid 501 (pid 9740) Jul 12 18:26:01 st002 harc[9741]: info: Running /etc/ha.d/rc.d/status status Jul 12 18:26:01 st002 heartbeat: [9742]: info: No local resources [/usr/lib/heartbeat/ResourceManager listkeys st002] to acquire. Jul 12 18:26:01 st002 heartbeat: [9731]: info: Initial resource acquisition complete (T_RESOURCES(us)) Jul 12 18:26:01 st002 mach_down[9761]: info: Taking over resource group IPaddr2::10.10.10.9/24/eth0/10.10.10.255 Jul 12 18:26:01 st002 ResourceManager[9781]: info: Acquiring resource group: st001 IPaddr2::10.10.10.9/24/eth0/10.10.10.255 Jul 12 18:26:01 st002 IPaddr2[9805]: INFO: IPaddr2 Resource is stopped Jul 12 18:26:01 st002 ResourceManager[9781]: info: Running /etc/ha.d/resource.d/IPaddr2 10.10.10.9/24/eth0/10.10.10.255 start Jul 12 18:26:02 st002 IPaddr2[10019]: INFO: /sbin/ip -f inet addr add 10.10.10.9/24 brd 10.10.10.255 dev eth0 Jul 12 18:26:02 st002 IPaddr2[10019]: INFO: /sbin/ip link set eth0 up Jul 12 18:26:02 st002 IPaddr2[10019]: INFO: /usr/lib/heartbeat/send_arp -i 200 -r 5 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-10.10.10.9 eth0 10.10.10.9 auto 10.10.10.9 ffffffffffff Jul 12 18:26:02 st002 IPaddr2[9937]: INFO: IPaddr2 Success Jul 12 18:26:02 st002 mach_down[9761]: info: /usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired Jul 12 18:26:02 st002 mach_down[9761]: info: mach_down takeover complete for node st001. Jul 12 18:26:02 st002 heartbeat: [9731]: info: mach_down takeover complete. Jul 12 18:26:12 st002 heartbeat: [9731]: info: Local Resource acquisition completed. (none) Jul 12 18:26:12 st002 heartbeat: [9731]: info: local resource transition completed. ______________________________________________________________________ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email ______________________________________________________________________ _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
