Hi,

I am having troubles with a heartbeat cluster.
1. Occasionally, I get timeouts on the IPaddr monitors (I have two IP addresses aliased on a single Ethernet) Feb 15 15:05:16 fhbmplb1 lrmd: [17670]: WARN: on_op_timeout_expired: TIMEOUT: operation monitor[14] on ocf::IPaddr::IPaddr_2 for client 17673, its parameters: ip=[10.3.35.32] CRM_meta_op_target_rc=[7] netmask=[24] crm_feature_set=[1.0.6] . Feb 15 15:05:16 fhbmplb1 lrmd: [17670]: WARN: on_op_timeout_expired: TIMEOUT: operation monitor[11] on ocf::IPaddr::IPaddr_4 for client 17673, its parameters: ip=[10.4.5.32] CRM_meta_op_target_rc=[7] netmask=[24] crm_feature_set=[1.0.6] .

2. When this occurs, the crm gives me info that some of the resources are to be recovered:
Feb 15 15:05:18 fhbmplb1 pengine: [17687]: info: Resource Group: group_1
Feb 15 15:05:18 fhbmplb1 pengine: [17687]: info: Filesystem_1 (heartbeat::ocf:Filesystem): Started fhbmplb1 Feb 15 15:05:18 fhbmplb1 pengine: [17687]: info: IPaddr_2 (heartbeat::ocf:IPaddr): Started fhbmplb1 FAILED Feb 15 15:05:18 fhbmplb1 pengine: [17687]: info: wsdepmgr_3 (lsb:wsdepmgr): Started fhbmplb1
Feb 15 15:05:18 fhbmplb1 pengine: [17687]: info: Resource Group: group_2
Feb 15 15:05:18 fhbmplb1 pengine: [17687]: info: IPaddr_4 (heartbeat::ocf:IPaddr): Started fhbmplb1 FAILED Feb 15 15:05:18 fhbmplb1 pengine: [17687]: info: arpinglb_5 (lsb:arpinglb): Started fhbmplb1 Feb 15 15:05:18 fhbmplb1 pengine: [17687]: notice: NoRoleChange:native.c Leave resource Filesystem_1 (fhbmplb1) Feb 15 15:05:18 fhbmplb1 pengine: [17687]: notice: NoRoleChange:native.c Recover resource IPaddr_2 (fhbmplb1) Feb 15 15:05:18 fhbmplb1 pengine: [17687]: notice: Recurring:native.c fhbmplb1 IPaddr_2_monitor_5000 Feb 15 15:05:18 fhbmplb1 pengine: [17687]: notice: NoRoleChange:native.c Leave resource wsdepmgr_3 (fhbmplb1) Feb 15 15:05:18 fhbmplb1 pengine: [17687]: notice: NoRoleChange:native.c Recover resource IPaddr_4 (fhbmplb1) Feb 15 15:05:18 fhbmplb1 pengine: [17687]: notice: Recurring:native.c fhbmplb1 IPaddr_4_monitor_5000 Feb 15 15:05:18 fhbmplb1 pengine: [17687]: notice: NoRoleChange:native.c Leave resource arpinglb_5 (fhbmplb1)

In fact, IPaddr_2 and IPaddr_4 are both being recovered:
Feb 15 15:05:18 fhbmplb1 pengine: [17687]: notice: stage8:stages.c Created transition graph 4. Feb 15 15:05:18 fhbmplb1 pengine: [17687]: WARN: process_pe_message:pengine.c No value specified for cluster preference: pe-input-series-max Feb 15 15:05:18 fhbmplb1 pengine: [17687]: info: process_pe_message:pengine.c Transition 4: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-263.bz2 Feb 15 15:05:18 fhbmplb1 crmd: [17673]: info: do_state_transition:fsa.c fhbmplb1: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=do_msg_route ] Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: unpack_graph:unpack.c Unpacked transition 4: 20 actions in 20 synapses Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: te_pseudo_action:actions.c Pseudo action 18 confirmed Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: te_pseudo_action:actions.c Pseudo action 19 confirmed Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: send_rsc_command:actions.c Initiating action 14: wsdepmgr_3_stop_0 on fhbmplb1 Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: te_pseudo_action:actions.c Pseudo action 25 confirmed Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: send_rsc_command:actions.c Initiating action 21: arpinglb_5_stop_0 on fhbmplb1 Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: te_pseudo_action:actions.c Pseudo action 16 confirmed Feb 15 15:05:18 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c Performing op stop on wsdepmgr_3 (interval=0ms, key=4:a3a46044-31af-41f0-bf98-10646065e52d) Feb 15 15:05:18 fhbmplb1 lrmd: [15641]: WARN: For LSB init script, no additional parameters are needed. Feb 15 15:05:18 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c Performing op stop on arpinglb_5 (interval=0ms, key=4:a3a46044-31af-41f0-bf98-10646065e52d) Feb 15 15:05:18 fhbmplb1 lrmd: [15642]: WARN: For LSB init script, no additional parameters are needed. Feb 15 15:05:18 fhbmplb1 crmd: [17673]: WARN: process_lrm_event:lrm.c LRM operation (16) monitor_300000 on wsdepmgr_3 Cancelled Feb 15 15:05:18 fhbmplb1 crmd: [17673]: WARN: process_lrm_event:lrm.c LRM operation (13) monitor_120000 on arpinglb_5 Cancelled Feb 15 15:05:18 fhbmplb1 lrmd: [17670]: info: RA output: (wsdepmgr_3:stop:stdout) Ending deployment manager: dmgr Feb 15 15:05:18 fhbmplb1 lrmd: [17670]: info: RA output: (arpinglb_5:stop:stdout) Nothing to do Feb 15 15:05:18 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c LRM operation (20) stop_0 on arpinglb_5 complete Feb 15 15:05:18 fhbmplb1 cib: [15640]: info: write_cib_contents:io.c Wrote version 0.45.11376 of the CIB to disk (digest: 3f402f15495cf45733fb61e9210aa2ca) Feb 15 15:05:18 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c Update (client: 17673, call:54): 0.45.11376 -> 0.45.11377 (ok)
Feb 15 15:05:18 fhbmplb1 mgmtd: [17674]: debug: update cib finished
Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: te_update_diff:callbacks.c Processing diff (cib_update): 0.45.11376 -> 0.45.11377 Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: match_graph_event:events.c Action arpinglb_5_stop_0 (21) confirmed Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: send_rsc_command:actions.c Initiating action 7: IPaddr_4_stop_0 on fhbmplb1 Feb 15 15:05:18 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c Performing op stop on IPaddr_4 (interval=0ms, key=4:a3a46044-31af-41f0-bf98-10646065e52d) Feb 15 15:05:18 fhbmplb1 crmd: [17673]: WARN: process_lrm_event:lrm.c LRM operation (11) monitor_5000 on IPaddr_4 Cancelled Feb 15 15:05:18 fhbmplb1 cib: [15646]: info: write_cib_contents:io.c Wrote version 0.45.11377 of the CIB to disk (digest: 8ae4066a99a414fee984b14a19153b4d) Feb 15 15:05:18 fhbmplb1 IPaddr[15648]: [15669]: INFO: /sbin/route -n del -host 10.4.5.32~ Feb 15 15:05:18 fhbmplb1 lrmd: [17670]: info: RA output: (IPaddr_4:stop:stderr) SIOCDELRT: No such process Feb 15 15:05:18 fhbmplb1 IPaddr[15648]: [15671]: INFO: /sbin/ifconfig eth0:0 10.4.5.32 downc Feb 15 15:05:18 fhbmplb1 IPaddr[15648]: [15674]: INFO: IP Address 10.4.5.32 released‡ Feb 15 15:05:18 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c LRM operation (22) stop_0 on IPaddr_4 complete Feb 15 15:05:18 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c Update (client: 17673, call:55): 0.45.11377 -> 0.45.11378 (ok)
Feb 15 15:05:18 fhbmplb1 mgmtd: [17674]: debug: update cib finished
Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: te_update_diff:callbacks.c Processing diff (cib_update): 0.45.11377 -> 0.45.11378 Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: match_graph_event:events.c Action IPaddr_4_stop_0 (7) confirmed Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: te_pseudo_action:actions.c Pseudo action 26 confirmed Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: te_pseudo_action:actions.c Pseudo action 23 confirmed Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: send_rsc_command:actions.c Initiating action 20: IPaddr_4_start_0 on fhbmplb1 Feb 15 15:05:18 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c Performing op start on IPaddr_4 (interval=0ms, key=4:a3a46044-31af-41f0-bf98-10646065e52d) Feb 15 15:05:18 fhbmplb1 cib: [15675]: info: write_cib_contents:io.c Wrote version 0.45.11378 of the CIB to disk (digest: 5c1745ebf93166ba3698f8b8a8ed07a4) Feb 15 15:05:18 fhbmplb1 IPaddr[15676]: [15735]: INFO: /sbin/ifconfig eth0:0 10.4.5.32 netmask 255.255.255.0 broadcast 10.4.5.255 Feb 15 15:05:18 fhbmplb1 IPaddr[15676]: [15742]: INFO: Sending Gratuitous Arp for 10.4.5.32 on eth0:0 [eth0] Feb 15 15:05:18 fhbmplb1 IPaddr[15676]: [15743]: INFO: /usr/lib64/heartbeat/send_arp -i 500 -r 10 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-10.4.5.32 eth0 10.4.5.32 auto 10.4.5.32 ffffffffffff Feb 15 15:05:18 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c LRM operation (23) start_0 on IPaddr_4 complete Feb 15 15:05:18 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c Update (client: 17673, call:56): 0.45.11378 -> 0.45.11379 (ok)
Feb 15 15:05:18 fhbmplb1 mgmtd: [17674]: debug: update cib finished
Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: te_update_diff:callbacks.c Processing diff (cib_update): 0.45.11378 -> 0.45.11379 Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: match_graph_event:events.c Action IPaddr_4_start_0 (20) confirmed Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: send_rsc_command:actions.c Initiating action 6: IPaddr_4_monitor_5000 on fhbmplb1 Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: send_rsc_command:actions.c Initiating action 22: arpinglb_5_start_0 on fhbmplb1 Feb 15 15:05:18 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c Performing op monitor on IPaddr_4 (interval=5000ms, key=4:a3a46044-31af-41f0-bf98-10646065e52d) Feb 15 15:05:18 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c Performing op start on arpinglb_5 (interval=0ms, key=4:a3a46044-31af-41f0-bf98-10646065e52d) Feb 15 15:05:18 fhbmplb1 lrmd: [15767]: WARN: For LSB init script, no additional parameters are needed. Feb 15 15:05:18 fhbmplb1 lrmd: [17670]: info: RA output: (arpinglb_5:start:stdout) Sending out ARP replies as 10.4.5.32 Feb 15 15:05:18 fhbmplb1 cib: [15764]: info: write_cib_contents:io.c Wrote version 0.45.11379 of the CIB to disk (digest: deea946b175c6692b067f5ae289a5ceb) Feb 15 15:05:18 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c LRM operation (24) monitor_5000 on IPaddr_4 complete Feb 15 15:05:18 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c Update (client: 17673, call:57): 0.45.11379 -> 0.45.11380 (ok)
Feb 15 15:05:18 fhbmplb1 mgmtd: [17674]: debug: update cib finished
Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: te_update_diff:callbacks.c Processing diff (cib_update): 0.45.11379 -> 0.45.11380 Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: match_graph_event:events.c Action IPaddr_4_monitor_5000 (6) confirmed Feb 15 15:05:18 fhbmplb1 cib: [15795]: info: write_cib_contents:io.c Wrote version 0.45.11380 of the CIB to disk (digest: 537fd9703bde42e997662c8ce17e2082) Feb 15 15:05:20 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c LRM operation (25) start_0 on arpinglb_5 complete Feb 15 15:05:20 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c Update (client: 17673, call:58): 0.45.11380 -> 0.45.11381 (ok)
Feb 15 15:05:20 fhbmplb1 mgmtd: [17674]: debug: update cib finished
Feb 15 15:05:20 fhbmplb1 tengine: [17686]: info: te_update_diff:callbacks.c Processing diff (cib_update): 0.45.11380 -> 0.45.11381 Feb 15 15:05:20 fhbmplb1 tengine: [17686]: info: match_graph_event:events.c Action arpinglb_5_start_0 (22) confirmed Feb 15 15:05:20 fhbmplb1 tengine: [17686]: info: te_pseudo_action:actions.c Pseudo action 24 confirmed Feb 15 15:05:20 fhbmplb1 tengine: [17686]: info: send_rsc_command:actions.c Initiating action 1: arpinglb_5_monitor_120000 on fhbmplb1 Feb 15 15:05:20 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c Performing op monitor on arpinglb_5 (interval=120000ms, key=4:a3a46044-31af-41f0-bf98-10646065e52d) Feb 15 15:05:20 fhbmplb1 cib: [15805]: info: write_cib_contents:io.c Wrote version 0.45.11381 of the CIB to disk (digest: b80af82378b5f39735f514e15b68d601) Feb 15 15:05:20 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c LRM operation (26) monitor_120000 on arpinglb_5 complete Feb 15 15:05:20 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c Update (client: 17673, call:59): 0.45.11381 -> 0.45.11382 (ok)
Feb 15 15:05:20 fhbmplb1 mgmtd: [17674]: debug: update cib finished
Feb 15 15:05:20 fhbmplb1 tengine: [17686]: info: te_update_diff:callbacks.c Processing diff (cib_update): 0.45.11381 -> 0.45.11382 Feb 15 15:05:20 fhbmplb1 tengine: [17686]: info: match_graph_event:events.c Action arpinglb_5_monitor_120000 (1) confirmed Feb 15 15:05:20 fhbmplb1 cib: [15810]: info: write_cib_contents:io.c Wrote version 0.45.11382 of the CIB to disk (digest: 61ead707496c8b9e495030d8184edbf3) Feb 15 15:05:20 fhbmplb1 lrmd: [17670]: info: RA output: (wsdepmgr_3:stop:stdout) ADMU0116I: Tool information is being logged in file Feb 15 15:05:20 fhbmplb1 lrmd: [17670]: info: RA output: (wsdepmgr_3:stop:stdout) /opt/IBM/WebSphere/AppServer/profiles/Dmgr01/logs/dmgr/stopServer.log Feb 15 15:05:20 fhbmplb1 lrmd: [17670]: info: RA output: (wsdepmgr_3:stop:stdout) Feb 15 15:05:21 fhbmplb1 lrmd: [17670]: info: RA output: (wsdepmgr_3:stop:stdout) ADMU0128I: Starting tool with the Dmgr01 profile ADMU3100I: Reading configuration for server: dmgr Feb 15 15:05:25 fhbmplb1 lrmd: [17670]: info: RA output: (wsdepmgr_3:stop:stdout) ADMU3201I: Server stop request issued. Waiting for stop status. Feb 15 15:05:39 fhbmplb1 lrmd: [17670]: info: RA output: (wsdepmgr_3:stop:stdout) ADMU4000I: Server dmgr stop completed. Feb 15 15:05:39 fhbmplb1 lrmd: [17670]: info: RA output: (wsdepmgr_3:stop:stdout) Feb 15 15:05:39 fhbmplb1 lrmd: [17670]: info: RA output: (wsdepmgr_3:stop:stderr) /dev/sdb1: ce Feb 15 15:05:39 fhbmplb1 lrmd: [17670]: info: RA output: (wsdepmgr_3:stop:stdout) 18107 Feb 15 15:05:39 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c LRM operation (18) stop_0 on wsdepmgr_3 complete Feb 15 15:05:39 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c Update (client: 17673, call:60): 0.45.11382 -> 0.45.11383 (ok)
Feb 15 15:05:39 fhbmplb1 mgmtd: [17674]: debug: update cib finished
Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info: te_update_diff:callbacks.c Processing diff (cib_update): 0.45.11382 -> 0.45.11383 Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info: match_graph_event:events.c Action wsdepmgr_3_stop_0 (14) confirmed Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info: send_rsc_command:actions.c Initiating action 3: IPaddr_2_stop_0 on fhbmplb1 Feb 15 15:05:39 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c Performing op stop on IPaddr_2 (interval=0ms, key=4:a3a46044-31af-41f0-bf98-10646065e52d) Feb 15 15:05:39 fhbmplb1 crmd: [17673]: WARN: process_lrm_event:lrm.c LRM operation (14) monitor_5000 on IPaddr_2 Cancelled Feb 15 15:05:39 fhbmplb1 cib: [16198]: info: write_cib_contents:io.c Wrote version 0.45.11383 of the CIB to disk (digest: 5ae0d2f817797c6374f8708c68bfcee4) Feb 15 15:05:39 fhbmplb1 IPaddr[16199]: [16220]: INFO: /sbin/route -n del -host 10.3.35.32 Feb 15 15:05:39 fhbmplb1 lrmd: [17670]: info: RA output: (IPaddr_2:stop:stderr) SIOCDELRT: No such process Feb 15 15:05:39 fhbmplb1 IPaddr[16199]: [16222]: INFO: /sbin/ifconfig eth1:0 10.3.35.32 down Feb 15 15:05:39 fhbmplb1 IPaddr[16199]: [16225]: INFO: IP Address 10.3.35.32 released Feb 15 15:05:39 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c LRM operation (28) stop_0 on IPaddr_2 complete Feb 15 15:05:39 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c Update (client: 17673, call:61): 0.45.11383 -> 0.45.11384 (ok)
Feb 15 15:05:39 fhbmplb1 mgmtd: [17674]: debug: update cib finished
Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info: te_update_diff:callbacks.c Processing diff (cib_update): 0.45.11383 -> 0.45.11384 Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info: match_graph_event:events.c Action IPaddr_2_stop_0 (3) confirmed Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info: send_rsc_command:actions.c Initiating action 13: IPaddr_2_start_0 on fhbmplb1 Feb 15 15:05:39 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c Performing op start on IPaddr_2 (interval=0ms, key=4:a3a46044-31af-41f0-bf98-10646065e52d) Feb 15 15:05:39 fhbmplb1 cib: [16227]: info: write_cib_contents:io.c Wrote version 0.45.11384 of the CIB to disk (digest: 7902ae928cad52cab1de6effbdcb5de5) Feb 15 15:05:39 fhbmplb1 IPaddr[16226]: [16274]: INFO: /sbin/ifconfig eth1:0 10.3.35.32 netmask 255.255.255.0 broadcast 10.3.35.255 Feb 15 15:05:39 fhbmplb1 IPaddr[16226]: [16279]: INFO: Sending Gratuitous Arp for 10.3.35.32 on eth1:0 [eth1] Feb 15 15:05:39 fhbmplb1 IPaddr[16226]: [16280]: INFO: /usr/lib64/heartbeat/send_arp -i 500 -r 10 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-10.3.35.32 eth1 10.3.35.32 auto 10.3.35.32 ffffffffffff Feb 15 15:05:39 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c LRM operation (29) start_0 on IPaddr_2 complete Feb 15 15:05:39 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c Update (client: 17673, call:62): 0.45.11384 -> 0.45.11385 (ok)
Feb 15 15:05:39 fhbmplb1 mgmtd: [17674]: debug: update cib finished
Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info: te_update_diff:callbacks.c Processing diff (cib_update): 0.45.11384 -> 0.45.11385 Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info: match_graph_event:events.c Action IPaddr_2_start_0 (13) confirmed Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info: send_rsc_command:actions.c Initiating action 2: IPaddr_2_monitor_5000 on fhbmplb1 Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info: send_rsc_command:actions.c Initiating action 15: wsdepmgr_3_start_0 on fhbmplb1 Feb 15 15:05:39 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c Performing op monitor on IPaddr_2 (interval=5000ms, key=4:a3a46044-31af-41f0-bf98-10646065e52d) Feb 15 15:05:39 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c Performing op start on wsdepmgr_3 (interval=0ms, key=4:a3a46044-31af-41f0-bf98-10646065e52d) Feb 15 15:05:39 fhbmplb1 lrmd: [16291]: WARN: For LSB init script, no additional parameters are needed. Feb 15 15:05:39 fhbmplb1 lrmd: [17670]: info: RA output: (wsdepmgr_3:start:stdout) Sending out ARP replies as 10.3.5.32 Feb 15 15:05:39 fhbmplb1 lrmd: [17670]: info: RA output: (wsdepmgr_3:start:stderr) bind: Cannot assign requested address Feb 15 15:05:39 fhbmplb1 lrmd: [17670]: info: RA output: (wsdepmgr_3:start:stdout) Starting deployment manager: dmgr Feb 15 15:05:39 fhbmplb1 cib: [16289]: info: write_cib_contents:io.c Wrote version 0.45.11385 of the CIB to disk (digest: 1bba9eb984b137be524e65dfc48ee4f9)
Feb 15 15:05:39 fhbmplb1 su: (to wasadmin) root on none
Feb 15 15:05:39 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c LRM operation (30) monitor_5000 on IPaddr_2 complete Feb 15 15:05:39 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c Update (client: 17673, call:63): 0.45.11385 -> 0.45.11386 (ok)
Feb 15 15:05:39 fhbmplb1 mgmtd: [17674]: debug: update cib finished
Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info: te_update_diff:callbacks.c Processing diff (cib_update): 0.45.11385 -> 0.45.11386 Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info: match_graph_event:events.c Action IPaddr_2_monitor_5000 (2) confirmed Feb 15 15:05:39 fhbmplb1 cib: [16334]: info: write_cib_contents:io.c Wrote version 0.45.11386 of the CIB to disk (digest: a2e0897afb893ea208c6aa39fc6506cd) Feb 15 15:05:40 fhbmplb1 lrmd: [17670]: info: RA output: (wsdepmgr_3:start:stdout) ADMU0116I: Tool information is being logged in file Feb 15 15:05:40 fhbmplb1 lrmd: [17670]: info: RA output: (wsdepmgr_3:start:stdout) Feb 15 15:05:40 fhbmplb1 lrmd: [17670]: info: RA output: (wsdepmgr_3:start:stdout) /opt/IBM/WebSphere/AppServer/profiles/Dmgr01/logs/dmgr/startServer.log Feb 15 15:05:41 fhbmplb1 lrmd: [17670]: info: RA output: (wsdepmgr_3:start:stdout) ADMU0128I: Starting tool with the Dmgr01 profile Feb 15 15:05:41 fhbmplb1 lrmd: [17670]: info: RA output: (wsdepmgr_3:start:stdout) ADMU3100I: Reading configuration for server: dmgr Feb 15 15:05:41 fhbmplb1 lrmd: [17670]: info: RA output: (wsdepmgr_3:start:stdout) Feb 15 15:05:44 fhbmplb1 lrmd: [17670]: info: RA output: (wsdepmgr_3:start:stdout) ADMU3200I: Server launched. Waiting for initialization status. Feb 15 15:06:12 fhbmplb1 lrmd: [17670]: info: RA output: (wsdepmgr_3:start:stdout) ADMU3000I: Server dmgr open for e-business; process id is 16401 Feb 15 15:06:12 fhbmplb1 lrmd: [17670]: info: RA output: (wsdepmgr_3:start:stdout) Feb 15 15:06:12 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c LRM operation (31) start_0 on wsdepmgr_3 complete Feb 15 15:06:12 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c Update (client: 17673, call:64): 0.45.11386 -> 0.45.11387 (ok)
Feb 15 15:06:12 fhbmplb1 mgmtd: [17674]: debug: update cib finished
Feb 15 15:06:12 fhbmplb1 tengine: [17686]: info: te_update_diff:callbacks.c Processing diff (cib_update): 0.45.11386 -> 0.45.11387 Feb 15 15:06:12 fhbmplb1 tengine: [17686]: info: match_graph_event:events.c Action wsdepmgr_3_start_0 (15) confirmed Feb 15 15:06:12 fhbmplb1 tengine: [17686]: info: te_pseudo_action:actions.c Pseudo action 17 confirmed Feb 15 15:06:12 fhbmplb1 tengine: [17686]: info: send_rsc_command:actions.c Initiating action 4: wsdepmgr_3_monitor_300000 on fhbmplb1 Feb 15 15:06:12 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c Performing op monitor on wsdepmgr_3 (interval=300000ms, key=4:a3a46044-31af-41f0-bf98-10646065e52d) Feb 15 15:06:12 fhbmplb1 cib: [16811]: info: write_cib_contents:io.c Wrote version 0.45.11387 of the CIB to disk (digest: d3c40a0cd875c87af78426d115d3615f) Feb 15 15:06:21 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c LRM operation (32) monitor_300000 on wsdepmgr_3 complete Feb 15 15:06:21 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c Update (client: 17673, call:65): 0.45.11387 -> 0.45.11388 (ok)
Feb 15 15:06:21 fhbmplb1 mgmtd: [17674]: debug: update cib finished
Feb 15 15:06:21 fhbmplb1 tengine: [17686]: info: te_update_diff:callbacks.c Processing diff (cib_update): 0.45.11387 -> 0.45.11388 Feb 15 15:06:21 fhbmplb1 tengine: [17686]: info: match_graph_event:events.c Action wsdepmgr_3_monitor_300000 (4) confirmed Feb 15 15:06:21 fhbmplb1 tengine: [17686]: info: run_graph:graph.c Transition 4: (Complete=20, Pending=0, Fired=0, Skipped=0, Incomplete=0) Feb 15 15:06:21 fhbmplb1 tengine: [17686]: info: notify_crmd:actions.c Transition 4 status: te_complete - (null) Feb 15 15:06:21 fhbmplb1 crmd: [17673]: info: do_state_transition:fsa.c fhbmplb1: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_IPC_MESSAGE origin=do_msg_route ] Feb 15 15:06:21 fhbmplb1 cib: [17102]: info: write_cib_contents:io.c Wrote version 0.45.11388 of the CIB to disk (digest: b68ab97e9a799fef743e211d7b8625ff)

but wsdepmgr_3 and arpinglb_5 are also brought down and up.

There are two things I do not understand:
1. why does the timeout occur on the IPaddr resources (they occur sporadically without any relation to the network load) 2. if the IPaddr resource can (and intended to) be recovered by crm, why the other resources in the resource group are also
cycled.

The cluster version is: 2.0.5-7.10
The cib.xml is attached.


Thanks in advance for any help,

Imre

 <cib generated="true" admin_epoch="0" have_quorum="true" num_peers="2" cib_feature_revision="1.3" epoch="47" num_updates="11585" cib-last-written="Mon Feb 23 11:42:11 2009" crm-debug-origin="create_node_entry" ccm_transition="2" dc_uuid="c42562a1-fbc4-4180-a3c2-41752bbe56d5">
   <configuration>
     <crm_config>
       <cluster_property_set id="cib-bootstrap-options">
         <attributes>
           <nvpair id="cib-bootstrap-options-symmetric_cluster" name="symmetric_cluster" value="true"/>
           <nvpair id="cib-bootstrap-options-no_quorum_policy" name="no_quorum_policy" value="stop"/>
           <nvpair id="cib-bootstrap-options-default_resource_stickiness" name="default_resource_stickiness" value="0"/>
           <nvpair id="cib-bootstrap-options-stonith_enabled" name="stonith_enabled" value="false"/>
           <nvpair id="cib-bootstrap-options-stop_orphan_resources" name="stop_orphan_resources" value="true"/>
           <nvpair id="cib-bootstrap-options-stop_orphan_actions" name="stop_orphan_actions" value="true"/>
           <nvpair id="cib-bootstrap-options-remove_after_stop" name="remove_after_stop" value="false"/>
           <nvpair id="cib-bootstrap-options-transition_idle_timeout" name="transition_idle_timeout" value="5min"/>
           <nvpair id="cib-bootstrap-options-is_managed_default" name="is_managed_default" value="true"/>
         </attributes>
       </cluster_property_set>
     </crm_config>
     <nodes>
       <node id="c42562a1-fbc4-4180-a3c2-41752bbe56d5" uname="fhbmplb1" type="normal"/>
       <node id="5c447dff-4b89-422b-92f6-2985663eb5c4" uname="fhbmplb2" type="normal"/>
     </nodes>
     <resources>
       <group id="group_1">
         <instance_attributes id="group_1">
           <attributes>
             <nvpair name="resource_stickiness" id="group_1-resource_stickiness" value="INFINITY"/>
             <nvpair id="group_1-resource" name="resource" value="INFINITY"/>
           </attributes>
         </instance_attributes>
         <primitive class="ocf" id="Filesystem_1" provider="heartbeat" type="Filesystem">
           <operations>
             <op id="Filesystem_1_mon" interval="120s" name="monitor" timeout="60s"/>
           </operations>
           <instance_attributes id="Filesystem_1_inst_attr">
             <attributes>
               <nvpair id="Filesystem_1_attr_0" name="device" value="/dev/sdb1"/>
               <nvpair id="Filesystem_1_attr_1" name="directory" value="/opt/IBM"/>
               <nvpair id="Filesystem_1_attr_2" name="fstype" value="ext3"/>
             </attributes>
           </instance_attributes>
         </primitive>
         <primitive class="ocf" id="IPaddr_2" provider="heartbeat" type="IPaddr">
           <operations>
             <op id="IPaddr_2_mon" interval="5s" name="monitor" timeout="5s"/>
           </operations>
           <instance_attributes id="IPaddr_2_inst_attr">
             <attributes>
               <nvpair id="IPaddr_2_attr_0" name="ip" value="10.3.35.32"/>
               <nvpair id="IPaddr_2_attr_1" name="netmask" value="24"/>
             </attributes>
           </instance_attributes>
         </primitive>
         <primitive class="lsb" id="wsdepmgr_3" provider="heartbeat" type="wsdepmgr">
           <operations>
             <op id="wsdepmgr_3_stop" name="stop" timeout="120s"/>
             <op id="wsdepmgr_3_start" name="start" timeout="120s"/>
             <op id="wsdepmgr_3_mon" interval="300s" name="monitor" timeout="120s"/>
           </operations>
         </primitive>
       </group>
       <group id="group_2">
         <instance_attributes id="group_2">
           <attributes>
             <nvpair name="resource_stickiness" id="group_2-resource_stickiness" value="INFINITY"/>
             <nvpair id="group_2-resource" name="resource" value="INFINITY"/>
           </attributes>
         </instance_attributes>
         <primitive class="ocf" id="IPaddr_4" provider="heartbeat" type="IPaddr">
           <operations>
             <op id="IPaddr_4_mon" interval="5s" name="monitor" timeout="5s"/>
           </operations>
           <instance_attributes id="IPaddr_4_inst_attr">
             <attributes>
               <nvpair id="IPaddr_4_attr_0" name="ip" value="10.4.5.32"/>
               <nvpair id="IPaddr_4_attr_1" name="netmask" value="24"/>
             </attributes>
           </instance_attributes>
         </primitive>
         <primitive class="lsb" id="arpinglb_5" provider="heartbeat" type="arpinglb">
           <operations>
             <op id="arpinglb_5_mon" interval="120s" name="monitor" timeout="60s"/>
           </operations>
         </primitive>
       </group>
     </resources>
     <constraints>
       <rsc_location id="rsc_location_group_1" rsc="group_1">
         <rule id="prefered_location_group_1" score="100">
           <expression attribute="#uname" id="prefered_location_group_1_expr" operation="eq" value="fhbmpswapp1"/>
         </rule>
       </rsc_location>
       <rsc_location id="rsc_location_group_2" rsc="group_2">
         <rule id="prefered_location_group_2" score="100">
           <expression attribute="#uname" id="prefered_location_group_2_expr" operation="eq" value="fhbmpswapp1"/>
         </rule>
       </rsc_location>
       <rsc_order id="order_IPaddr_2_Filesystem_1" from="IPaddr_2" action="start" type="after" to="Filesystem_1"/>
       <rsc_order id="order_wsdepmgr_3_IPaddr_2" from="wsdepmgr_3" action="start" type="after" to="IPaddr_2"/>
     </constraints>
   </configuration>
 </cib>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to