I have a two server Active / Passive cluster.
It's running on RHEL 4WS, Installed from RPM's. 
 
I see that st002 sees the first server as down, but I'm not sure why.
Both servers have iptables shutdown.
 
Chad
 
The RPM's are:
heartbeat-stonith-2.0.7-1.c4
heartbeat-ldirectord-2.0.7-1.c4
heartbeat-pils-2.0.7-1.c4
heartbeat-2.0.7-1.c4

ha.cf:
logfacility     local0
keepalive       1
deadtime        10
warntime        5
initdead        20
udpport         694
mcast eth0 225.0.0.1 694 1 0
auto_failback off
node            st001
node            st002
ping            10.10.10.1
respawn hacluster /usr/lib/heartbeat/ipfail
crm no

haresources:
st001 IPaddr2::10.10.10.9/24/eth0/10.10.10.255
 
Jul 12 18:24:16 st001 logd: [17111]: info: logd started with default
configuration.
Jul 12 18:24:16 st001 logd: [17112]: info: G_main_add_SignalHandler:
Added signal handler for signal 15
Jul 12 18:24:16 st001 logd: [17111]: info: G_main_add_SignalHandler:
Added signal handler for signal 15
Jul 12 18:24:16 st001 heartbeat: [17263]: WARN: Logging daemon is
disabled --enabling logging daemon is recommended
Jul 12 18:24:16 st001 heartbeat: [17263]: info:
**************************
Jul 12 18:24:16 st001 heartbeat: [17263]: info: Configuration validated.
Starting heartbeat 2.0.7
Jul 12 18:24:16 st001 heartbeat: [17264]: info: heartbeat: version 2.0.7
Jul 12 18:24:17 st001 heartbeat: [17264]: info: Heartbeat generation: 14
Jul 12 18:24:17 st001 heartbeat: [17264]: info:
G_main_add_TriggerHandler: Added signal manual handler
Jul 12 18:24:17 st001 heartbeat: [17264]: info:
G_main_add_TriggerHandler: Added signal manual handler
Jul 12 18:24:17 st001 heartbeat: [17264]: info: Removing
/var/run/heartbeat/rsctmp failed, recreating.
Jul 12 18:24:17 st001 heartbeat: [17264]: info: glib: UDP multicast
heartbeat started for group 225.0.0.1 port 694 interface eth0 (ttl=1
loop=0)
Jul 12 18:24:17 st001 heartbeat: [17264]: info: glib: ping heartbeat
started.
Jul 12 18:24:17 st001 heartbeat: [17264]: info:
G_main_add_SignalHandler: Added signal handler for signal 17
Jul 12 18:24:17 st001 heartbeat: [17264]: info: Local status now set to:
'up'
Jul 12 18:24:18 st001 heartbeat: [17264]: info: Link
10.10.10.1:10.10.10.1 up.
Jul 12 18:24:18 st001 heartbeat: [17264]: info: Status update for node
10.10.10.1: status ping
Jul 12 18:24:37 st001 heartbeat: [17264]: WARN: node st002: is dead
Jul 12 18:24:37 st001 heartbeat: [17264]: info: Comm_now_up(): updating
status to active
Jul 12 18:24:37 st001 heartbeat: [17264]: info: Local status now set to:
'active'
Jul 12 18:24:37 st001 heartbeat: [17264]: info: Starting child client
"/usr/lib/heartbeat/ipfail" (501,501)
Jul 12 18:24:37 st001 heartbeat: [17264]: WARN: No STONITH device
configured.
Jul 12 18:24:37 st001 heartbeat: [17264]: WARN: Shared disks are not
protected.
Jul 12 18:24:37 st001 heartbeat: [17264]: info: Resources being acquired
from st002.
Jul 12 18:24:37 st001 heartbeat: [17273]: info: Starting
"/usr/lib/heartbeat/ipfail" as uid 501  gid 501 (pid 17273)
Jul 12 18:24:37 st001 harc[17274]: info: Running /etc/ha.d/rc.d/status
status
Jul 12 18:24:37 st001 mach_down[17295]: info:
/usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired
Jul 12 18:24:37 st001 mach_down[17295]: info: mach_down takeover
complete for node st002.
Jul 12 18:24:37 st001 heartbeat: [17264]: info: Initial resource
acquisition complete (T_RESOURCES(us))
Jul 12 18:24:37 st001 heartbeat: [17264]: info: mach_down takeover
complete.
Jul 12 18:24:37 st001 IPaddr2[17322]: INFO: IPaddr2 Resource is stopped
Jul 12 18:24:37 st001 heartbeat: [17264]: WARN: G_SIG_dispatch: Dispatch
function for SIGCHLD took too long to execute: 20 ms (> 10 ms) (GSource:
0x9fce078)
Jul 12 18:24:37 st001 heartbeat: [17275]: info: Local Resource
acquisition completed.
Jul 12 18:24:37 st001 harc[17451]: info: Running
/etc/ha.d/rc.d/ip-request-resp ip-request-resp
Jul 12 18:24:38 st001 ip-request-resp[17451]: received ip-request-resp
IPaddr2::10.10.10.9/24/eth0/10.10.10.255 OK yes
Jul 12 18:24:38 st001 ResourceManager[17466]: info: Acquiring resource
group: st001 IPaddr2::10.10.10.9/24/eth0/10.10.10.255
Jul 12 18:24:38 st001 IPaddr2[17490]: INFO: IPaddr2 Resource is stopped
Jul 12 18:24:38 st001 ResourceManager[17466]: info: Running
/etc/ha.d/resource.d/IPaddr2 10.10.10.9/24/eth0/10.10.10.255 start
Jul 12 18:24:38 st001 IPaddr2[17704]: INFO: /sbin/ip -f inet addr add
10.10.10.9/24 brd 10.10.10.255 dev eth0
Jul 12 18:24:38 st001 IPaddr2[17704]: INFO: /sbin/ip link set eth0 up
Jul 12 18:24:38 st001 IPaddr2[17704]: INFO: /usr/lib/heartbeat/send_arp
-i 200 -r 5 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-10.10.10.9
eth0 10.10.10.9 auto 10.10.10.9 ffffffffffff
Jul 12 18:24:38 st001 IPaddr2[17622]: INFO: IPaddr2 Success
Jul 12 18:24:48 st001 heartbeat: [17264]: info: Local Resource
acquisition completed. (none)
Jul 12 18:24:48 st001 heartbeat: [17264]: info: local resource
transition completed.
 

When I start st002 I get:
 
Jul 12 18:25:42 st001 heartbeat: [17264]: info: Link st002:eth0 up.
Jul 12 18:25:42 st001 heartbeat: [17264]: info: Status update for node
st002: status init
Jul 12 18:25:42 st001 ipfail: [17273]: info: Link Status update: Link
st002/eth0 now has status up
Jul 12 18:25:42 st001 ipfail: [17273]: info: Status update: Node st002
now has status init
Jul 12 18:25:42 st001 heartbeat: [17264]: info: Status update for node
st002: status up
Jul 12 18:25:42 st001 ipfail: [17273]: info: Status update: Node st002
now has status up
Jul 12 18:25:42 st001 harc[17757]: info: Running /etc/ha.d/rc.d/status
status
Jul 12 18:25:42 st001 harc[17768]: info: Running /etc/ha.d/rc.d/status
status
Jul 12 18:25:44 st001 heartbeat: [17264]: info: all clients are now
paused
Jul 12 18:26:01 st001 heartbeat: [17264]: WARN: 1 lost packet(s) for
[st002] [24:26]
Jul 12 18:26:01 st001 heartbeat: [17264]: info: Status update for node
st002: status active
Jul 12 18:26:01 st001 ipfail: [17273]: info: Status update: Node st002
now has status active
Jul 12 18:26:01 st001 heartbeat: [17264]: info: No pkts missing from
st002!
Jul 12 18:26:01 st001 heartbeat: [17264]: info: remote resource
transition completed.
Jul 12 18:26:02 st001 harc[17778]: info: Running /etc/ha.d/rc.d/status
status
Jul 12 18:26:02 st001 heartbeat: [17264]: ERROR: Both machines own our
resources!
Jul 12 18:26:02 st001 heartbeat: [17264]: ERROR: Both machines own
foreign resources!
Jul 12 18:26:02 st001 heartbeat: [17264]: ERROR: Both machines own our
resources!
Jul 12 18:26:02 st001 heartbeat: [17264]: ERROR: Both machines own
foreign resources!
Jul 12 18:26:02 st001 heartbeat: [17264]: WARN: G_CH_dispatch_int:
Dispatch function for read child took too long to execute: 90 ms (> 50
ms) (GSource: 0x9fd4530)
Jul 12 18:26:02 st001 heartbeat: [17264]: ERROR: Both machines own our
resources!
Jul 12 18:26:02 st001 heartbeat: [17264]: ERROR: Both machines own
foreign resources!
Jul 12 18:26:12 st001 heartbeat: [17264]: ERROR: Both machines own our
resources!
Jul 12 18:26:12 st001 heartbeat: [17264]: ERROR: Both machines own
foreign resources!
 
 
 
Server number two:
 
Jul 12 18:25:40 st002 logd: [9578]: info: logd started with default
configuration.
Jul 12 18:25:40 st002 logd: [9579]: info: G_main_add_SignalHandler:
Added signal handler for signal 15
Jul 12 18:25:40 st002 logd: [9578]: info: G_main_add_SignalHandler:
Added signal handler for signal 15
Jul 12 18:25:40 st002 heartbeat: [9730]: WARN: Logging daemon is
disabled --enabling logging daemon is recommended
Jul 12 18:25:40 st002 heartbeat: [9730]: info:
**************************
Jul 12 18:25:40 st002 heartbeat: [9730]: info: Configuration validated.
Starting heartbeat 2.0.7
Jul 12 18:25:40 st002 heartbeat: [9731]: info: heartbeat: version 2.0.7
Jul 12 18:25:41 st002 heartbeat: [9731]: info: Heartbeat generation: 9
Jul 12 18:25:41 st002 heartbeat: [9731]: info:
G_main_add_TriggerHandler: Added signal manual handler
Jul 12 18:25:41 st002 heartbeat: [9731]: info:
G_main_add_TriggerHandler: Added signal manual handler
Jul 12 18:25:41 st002 heartbeat: [9731]: info: Removing
/var/run/heartbeat/rsctmp failed, recreating.
Jul 12 18:25:41 st002 heartbeat: [9731]: info: glib: UDP multicast
heartbeat started for group 225.0.0.1 port 694 interface eth0 (ttl=1
loop=0)
Jul 12 18:25:41 st002 heartbeat: [9731]: info: glib: ping heartbeat
started.
Jul 12 18:25:41 st002 heartbeat: [9731]: info: G_main_add_SignalHandler:
Added signal handler for signal 17
Jul 12 18:25:41 st002 heartbeat: [9731]: info: Local status now set to:
'up'
Jul 12 18:25:42 st002 heartbeat: [9731]: info: Link
10.10.10.1:10.10.10.1 up.
Jul 12 18:25:42 st002 heartbeat: [9731]: info: Status update for node
10.10.10.1: status ping
Jul 12 18:26:01 st002 heartbeat: [9731]: WARN: node st001: is dead
Jul 12 18:26:01 st002 heartbeat: [9731]: info: Comm_now_up(): updating
status to active
Jul 12 18:26:01 st002 heartbeat: [9731]: info: Local status now set to:
'active'
Jul 12 18:26:01 st002 heartbeat: [9731]: info: Starting child client
"/usr/lib/heartbeat/ipfail" (501,501)
Jul 12 18:26:01 st002 heartbeat: [9731]: WARN: No STONITH device
configured.
Jul 12 18:26:01 st002 heartbeat: [9731]: WARN: Shared disks are not
protected.
Jul 12 18:26:01 st002 heartbeat: [9731]: info: Resources being acquired
from st001.
Jul 12 18:26:01 st002 heartbeat: [9740]: info: Starting
"/usr/lib/heartbeat/ipfail" as uid 501  gid 501 (pid 9740)
Jul 12 18:26:01 st002 harc[9741]: info: Running /etc/ha.d/rc.d/status
status
Jul 12 18:26:01 st002 heartbeat: [9742]: info: No local resources
[/usr/lib/heartbeat/ResourceManager listkeys st002] to acquire.
Jul 12 18:26:01 st002 heartbeat: [9731]: info: Initial resource
acquisition complete (T_RESOURCES(us))
Jul 12 18:26:01 st002 mach_down[9761]: info: Taking over resource group
IPaddr2::10.10.10.9/24/eth0/10.10.10.255
Jul 12 18:26:01 st002 ResourceManager[9781]: info: Acquiring resource
group: st001 IPaddr2::10.10.10.9/24/eth0/10.10.10.255
Jul 12 18:26:01 st002 IPaddr2[9805]: INFO: IPaddr2 Resource is stopped
Jul 12 18:26:01 st002 ResourceManager[9781]: info: Running
/etc/ha.d/resource.d/IPaddr2 10.10.10.9/24/eth0/10.10.10.255 start
Jul 12 18:26:02 st002 IPaddr2[10019]: INFO: /sbin/ip -f inet addr add
10.10.10.9/24 brd 10.10.10.255 dev eth0
Jul 12 18:26:02 st002 IPaddr2[10019]: INFO: /sbin/ip link set eth0 up
Jul 12 18:26:02 st002 IPaddr2[10019]: INFO: /usr/lib/heartbeat/send_arp
-i 200 -r 5 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-10.10.10.9
eth0 10.10.10.9 auto 10.10.10.9 ffffffffffff
Jul 12 18:26:02 st002 IPaddr2[9937]: INFO: IPaddr2 Success
Jul 12 18:26:02 st002 mach_down[9761]: info:
/usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired
Jul 12 18:26:02 st002 mach_down[9761]: info: mach_down takeover complete
for node st001.
Jul 12 18:26:02 st002 heartbeat: [9731]: info: mach_down takeover
complete.
Jul 12 18:26:12 st002 heartbeat: [9731]: info: Local Resource
acquisition completed. (none)
Jul 12 18:26:12 st002 heartbeat: [9731]: info: local resource transition
completed.


______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
______________________________________________________________________
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to