Hi everyone, I'm running into trouble configuring Heartbeat 2.1.3 on CentOS 5.6. I'm new to Linux high availability in general, sorry if the answer is obvious. Thanks in advance for taking the time to read my message.
Basically, when I reboot both my nodes simultaneously, nobody will mount the IP address (IPaddr2::172.22.4.1/24/eth0). However, as soon as I shutdown one of the nodes when they're both on, the other one will takeover and mount the IP address. Let me show you my configuration files and some logs: # [/etc/ha.d/ha.cf] on both nodes: # debugfile /var/log/ha-debug logfile /var/log/ha-log logfacility local0 keepalive 2 deadtime 10 auto_failback on udpport 694 bcast eth0 node PBX02.BRM node PBX03.BRM # [/etc/ha.d/haresources] on both nodes: # PBX02.BRM IPaddr2::172.22.4.1/24/eth0 asterisk # [/var/log/ha-log] PROBLEMATIC SITUATION # This is what happens from the very beginning of a simultaneous boot of my nodes. They won't mount the IP address. # SYSTEM A: PBX02.BRM # heartbeat[3151]: 2011/07/08_11:34:50 info: Version 2 support: false heartbeat[3151]: 2011/07/08_11:34:50 WARN: Logging daemon is disabled --enabling logging daemon is recommended heartbeat[3151]: 2011/07/08_11:34:50 info: ************************** heartbeat[3151]: 2011/07/08_11:34:50 info: Configuration validated. Starting heartbeat 2.1.3 heartbeat[3152]: 2011/07/08_11:34:50 info: heartbeat: version 2.1.3 heartbeat[3152]: 2011/07/08_11:34:50 info: Heartbeat generation: 1310070701 heartbeat[3152]: 2011/07/08_11:34:50 info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0 heartbeat[3152]: 2011/07/08_11:34:50 info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1 heartbeat[3152]: 2011/07/08_11:34:50 info: G_main_add_TriggerHandler: Added signal manual handler heartbeat[3152]: 2011/07/08_11:34:50 info: G_main_add_TriggerHandler: Added signal manual handler heartbeat[3152]: 2011/07/08_11:34:50 info: G_main_add_SignalHandler: Added signal handler for signal 17 heartbeat[3152]: 2011/07/08_11:34:50 info: Local status now set to: 'up' heartbeat[3152]: 2011/07/08_11:34:51 info: Link pbx02.brm:eth0 up. heartbeat[3152]: 2011/07/08_11:34:52 info: Link pbx03.brm:eth0 up. heartbeat[3152]: 2011/07/08_11:34:52 info: Status update for node pbx03.brm: status active heartbeat[3152]: 2011/07/08_11:34:52 info: Comm_now_up(): updating status to active heartbeat[3152]: 2011/07/08_11:34:52 info: Local status now set to: 'active' harc[3287]: 2011/07/08_11:34:52 info: Running /etc/ha.d/rc.d/status status heartbeat[3152]: 2011/07/08_11:35:03 info: local resource transition completed. heartbeat[3152]: 2011/07/08_11:35:03 info: Initial resource acquisition complete (T_RESOURCES(us)) heartbeat[3651]: 2011/07/08_11:35:03 info: No local resources [/usr/share/heartbeat/ResourceManager listkeys pbx02.brm] to acquire. heartbeat[3152]: 2011/07/08_11:35:04 info: remote resource transition completed. # SYSTEM B: PBX03.BRM # heartbeat[3151]: 2011/07/08_11:34:48 info: Version 2 support: false heartbeat[3151]: 2011/07/08_11:34:48 WARN: Logging daemon is disabled --enabling logging daemon is recommended heartbeat[3151]: 2011/07/08_11:34:48 info: ************************** heartbeat[3151]: 2011/07/08_11:34:48 info: Configuration validated. Starting heartbeat 2.1.3 heartbeat[3152]: 2011/07/08_11:34:48 info: heartbeat: version 2.1.3 heartbeat[3152]: 2011/07/08_11:34:48 info: Heartbeat generation: 1310070791 heartbeat[3152]: 2011/07/08_11:34:48 info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0 heartbeat[3152]: 2011/07/08_11:34:48 info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1 heartbeat[3152]: 2011/07/08_11:34:48 info: G_main_add_TriggerHandler: Added signal manual handler heartbeat[3152]: 2011/07/08_11:34:48 info: G_main_add_TriggerHandler: Added signal manual handler heartbeat[3152]: 2011/07/08_11:34:48 info: G_main_add_SignalHandler: Added signal handler for signal 17 heartbeat[3152]: 2011/07/08_11:34:48 info: Local status now set to: 'up' heartbeat[3152]: 2011/07/08_11:34:49 info: Link pbx03.brm:eth0 up. heartbeat[3152]: 2011/07/08_11:34:52 info: Link pbx02.brm:eth0 up. heartbeat[3152]: 2011/07/08_11:34:52 info: Status update for node pbx02.brm: status up harc[3416]: 2011/07/08_11:34:52 info: Running /etc/ha.d/rc.d/status status heartbeat[3152]: 2011/07/08_11:34:52 info: Comm_now_up(): updating status to active heartbeat[3152]: 2011/07/08_11:34:52 info: Local status now set to: 'active' heartbeat[3152]: 2011/07/08_11:34:53 info: Status update for node pbx02.brm: status active harc[3435]: 2011/07/08_11:34:53 info: Running /etc/ha.d/rc.d/status status heartbeat[3152]: 2011/07/08_11:35:04 info: remote resource transition completed. heartbeat[3152]: 2011/07/08_11:35:04 info: remote resource transition completed. heartbeat[3152]: 2011/07/08_11:35:04 info: Initial resource acquisition complete (T_RESOURCES(us)) heartbeat[3688]: 2011/07/08_11:35:04 info: No local resources [/usr/share/heartbeat/ResourceManager listkeys pbx03.brm] to acquire. # [/var/log/ha-log] WORKING STATE # This is what happens if I shutdown one of both nodes. The IP address gets mounted instantly. # SYSTEM A: PBX02.BRM while PBX03.BRM is shutdown (note that it's also working when I force shutdown the other node) heartbeat[3152]: 2011/07/08_11:45:41 info: Received shutdown notice from 'pbx03.brm'. heartbeat[3152]: 2011/07/08_11:45:41 info: Resources being acquired from pbx03.brm. heartbeat[4237]: 2011/07/08_11:45:41 info: acquire local HA resources (standby). heartbeat[4238]: 2011/07/08_11:45:41 info: No local resources [/usr/share/heartbeat/ResourceManager listkeys pbx02.brm] to acquire. heartbeat[4237]: 2011/07/08_11:45:41 info: local HA resource acquisition completed (standby). heartbeat[3152]: 2011/07/08_11:45:41 info: Standby resource acquisition done [all]. harc[4263]: 2011/07/08_11:45:41 info: Running /etc/ha.d/rc.d/status status mach_down[4279]: 2011/07/08_11:45:41 info: Taking over resource group IPaddr2::172.22.4.1/24/eth0 ResourceManager[4305]: 2011/07/08_11:45:41 info: Acquiring resource group: pbx03.brm IPaddr2::172.22.4.1/24/eth0 asterisk IPaddr2[4332]: 2011/07/08_11:45:41 INFO: Resource is stopped ResourceManager[4305]: 2011/07/08_11:45:41 info: Running /etc/ha.d/resource.d/IPaddr2 172.22.4.1/24/eth0 start IPaddr2[4444]: 2011/07/08_11:45:41 INFO: ip -f inet addr add 172.22.4.1/24 brd 172.22.4.255 dev eth0 IPaddr2[4444]: 2011/07/08_11:45:41 INFO: ip link set eth0 up IPaddr2[4444]: 2011/07/08_11:45:41 INFO: /usr/lib64/heartbeat/send_arp -i 200 -r 5 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-172.22.4.1 eth0 172.22.4.1 auto not_used not_used IPaddr2[4415]: 2011/07/08_11:45:41 INFO: Success mach_down[4279]: 2011/07/08_11:45:41 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired mach_down[4279]: 2011/07/08_11:45:41 info: mach_down takeover complete for node pbx03.brm. heartbeat[3152]: 2011/07/08_11:45:41 info: mach_down takeover complete. Thank you very much, -- Gregory A. Lussier _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
