Hi,

On Tue, Feb 24, 2009 at 04:56:47PM +0100, Imre Sandor wrote:
> Hi,
>
> I am having troubles with a heartbeat cluster.
> 1. Occasionally, I get timeouts on the IPaddr monitors (I have two IP 
> addresses aliased on a single Ethernet)
> Feb 15 15:05:16 fhbmplb1 lrmd: [17670]: WARN: on_op_timeout_expired: 
> TIMEOUT: operation monitor[14] on ocf::IPaddr::IPaddr_2 for client 17673, 
> its parameters: ip=[10.3.35.32] CRM_meta_op_target_rc=[7] netmask=[24] 
> crm_feature_set=[1.0.6] .
> Feb 15 15:05:16 fhbmplb1 lrmd: [17670]: WARN: on_op_timeout_expired: 
> TIMEOUT: operation monitor[11] on ocf::IPaddr::IPaddr_4 for client 17673, 
> its parameters: ip=[10.4.5.32] CRM_meta_op_target_rc=[7] netmask=[24] 
> crm_feature_set=[1.0.6] .
>
> 2. When this occurs, the crm gives me info that some of the resources are 
> to be recovered:
> Feb 15 15:05:18 fhbmplb1 pengine: [17687]: info: Resource Group: group_1
> Feb 15 15:05:18 fhbmplb1 pengine: [17687]: info: Filesystem_1 
> (heartbeat::ocf:Filesystem): Started fhbmplb1
> Feb 15 15:05:18 fhbmplb1 pengine: [17687]: info: IPaddr_2 
> (heartbeat::ocf:IPaddr): Started fhbmplb1 FAILED
> Feb 15 15:05:18 fhbmplb1 pengine: [17687]: info: wsdepmgr_3 (lsb:wsdepmgr): 
> Started fhbmplb1
> Feb 15 15:05:18 fhbmplb1 pengine: [17687]: info: Resource Group: group_2
> Feb 15 15:05:18 fhbmplb1 pengine: [17687]: info: IPaddr_4 
> (heartbeat::ocf:IPaddr): Started fhbmplb1 FAILED
> Feb 15 15:05:18 fhbmplb1 pengine: [17687]: info: arpinglb_5 (lsb:arpinglb): 
> Started fhbmplb1
> Feb 15 15:05:18 fhbmplb1 pengine: [17687]: notice: NoRoleChange:native.c 
> Leave resource Filesystem_1 (fhbmplb1)
> Feb 15 15:05:18 fhbmplb1 pengine: [17687]: notice: NoRoleChange:native.c 
> Recover resource IPaddr_2 (fhbmplb1)
> Feb 15 15:05:18 fhbmplb1 pengine: [17687]: notice: Recurring:native.c 
> fhbmplb1 IPaddr_2_monitor_5000
> Feb 15 15:05:18 fhbmplb1 pengine: [17687]: notice: NoRoleChange:native.c 
> Leave resource wsdepmgr_3 (fhbmplb1)
> Feb 15 15:05:18 fhbmplb1 pengine: [17687]: notice: NoRoleChange:native.c 
> Recover resource IPaddr_4 (fhbmplb1)
> Feb 15 15:05:18 fhbmplb1 pengine: [17687]: notice: Recurring:native.c 
> fhbmplb1 IPaddr_4_monitor_5000
> Feb 15 15:05:18 fhbmplb1 pengine: [17687]: notice: NoRoleChange:native.c 
> Leave resource arpinglb_5 (fhbmplb1)
>
> In fact, IPaddr_2 and IPaddr_4 are both being recovered:
> Feb 15 15:05:18 fhbmplb1 pengine: [17687]: notice: stage8:stages.c Created 
> transition graph 4.
> Feb 15 15:05:18 fhbmplb1 pengine: [17687]: WARN: 
> process_pe_message:pengine.c No value specified for cluster preference: 
> pe-input-series-max
> Feb 15 15:05:18 fhbmplb1 pengine: [17687]: info: 
> process_pe_message:pengine.c Transition 4: PEngine Input stored in: 
> /var/lib/heartbeat/pengine/pe-input-263.bz2
> Feb 15 15:05:18 fhbmplb1 crmd: [17673]: info: do_state_transition:fsa.c 
> fhbmplb1: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ 
> input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=do_msg_route ]
> Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: unpack_graph:unpack.c 
> Unpacked transition 4: 20 actions in 20 synapses
> Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: te_pseudo_action:actions.c 
> Pseudo action 18 confirmed
> Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: te_pseudo_action:actions.c 
> Pseudo action 19 confirmed
> Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: send_rsc_command:actions.c 
> Initiating action 14: wsdepmgr_3_stop_0 on fhbmplb1
> Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: te_pseudo_action:actions.c 
> Pseudo action 25 confirmed
> Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: send_rsc_command:actions.c 
> Initiating action 21: arpinglb_5_stop_0 on fhbmplb1
> Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: te_pseudo_action:actions.c 
> Pseudo action 16 confirmed
> Feb 15 15:05:18 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c 
> Performing op stop on wsdepmgr_3 (interval=0ms, 
> key=4:a3a46044-31af-41f0-bf98-10646065e52d)
> Feb 15 15:05:18 fhbmplb1 lrmd: [15641]: WARN: For LSB init script, no 
> additional parameters are needed.
> Feb 15 15:05:18 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c 
> Performing op stop on arpinglb_5 (interval=0ms, 
> key=4:a3a46044-31af-41f0-bf98-10646065e52d)
> Feb 15 15:05:18 fhbmplb1 lrmd: [15642]: WARN: For LSB init script, no 
> additional parameters are needed.
> Feb 15 15:05:18 fhbmplb1 crmd: [17673]: WARN: process_lrm_event:lrm.c LRM 
> operation (16) monitor_300000 on wsdepmgr_3 Cancelled
> Feb 15 15:05:18 fhbmplb1 crmd: [17673]: WARN: process_lrm_event:lrm.c LRM 
> operation (13) monitor_120000 on arpinglb_5 Cancelled
> Feb 15 15:05:18 fhbmplb1 lrmd: [17670]: info: RA output: 
> (wsdepmgr_3:stop:stdout) Ending deployment manager: dmgr
> Feb 15 15:05:18 fhbmplb1 lrmd: [17670]: info: RA output: 
> (arpinglb_5:stop:stdout) Nothing to do
> Feb 15 15:05:18 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c LRM 
> operation (20) stop_0 on arpinglb_5 complete
> Feb 15 15:05:18 fhbmplb1 cib: [15640]: info: write_cib_contents:io.c Wrote 
> version 0.45.11376 of the CIB to disk (digest: 
> 3f402f15495cf45733fb61e9210aa2ca)
> Feb 15 15:05:18 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c 
> Update (client: 17673, call:54): 0.45.11376 -> 0.45.11377 (ok)
> Feb 15 15:05:18 fhbmplb1 mgmtd: [17674]: debug: update cib finished
> Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: te_update_diff:callbacks.c 
> Processing diff (cib_update): 0.45.11376 -> 0.45.11377
> Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: match_graph_event:events.c 
> Action arpinglb_5_stop_0 (21) confirmed
> Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: send_rsc_command:actions.c 
> Initiating action 7: IPaddr_4_stop_0 on fhbmplb1
> Feb 15 15:05:18 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c 
> Performing op stop on IPaddr_4 (interval=0ms, 
> key=4:a3a46044-31af-41f0-bf98-10646065e52d)
> Feb 15 15:05:18 fhbmplb1 crmd: [17673]: WARN: process_lrm_event:lrm.c LRM 
> operation (11) monitor_5000 on IPaddr_4 Cancelled
> Feb 15 15:05:18 fhbmplb1 cib: [15646]: info: write_cib_contents:io.c Wrote 
> version 0.45.11377 of the CIB to disk (digest: 
> 8ae4066a99a414fee984b14a19153b4d)
> Feb 15 15:05:18 fhbmplb1 IPaddr[15648]: [15669]: INFO: /sbin/route -n del 
> -host 10.4.5.32~
> Feb 15 15:05:18 fhbmplb1 lrmd: [17670]: info: RA output: 
> (IPaddr_4:stop:stderr) SIOCDELRT: No such process
> Feb 15 15:05:18 fhbmplb1 IPaddr[15648]: [15671]: INFO: /sbin/ifconfig 
> eth0:0 10.4.5.32 downc
> Feb 15 15:05:18 fhbmplb1 IPaddr[15648]: [15674]: INFO: IP Address 10.4.5.32 
> released???
> Feb 15 15:05:18 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c LRM 
> operation (22) stop_0 on IPaddr_4 complete
> Feb 15 15:05:18 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c 
> Update (client: 17673, call:55): 0.45.11377 -> 0.45.11378 (ok)
> Feb 15 15:05:18 fhbmplb1 mgmtd: [17674]: debug: update cib finished
> Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: te_update_diff:callbacks.c 
> Processing diff (cib_update): 0.45.11377 -> 0.45.11378
> Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: match_graph_event:events.c 
> Action IPaddr_4_stop_0 (7) confirmed
> Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: te_pseudo_action:actions.c 
> Pseudo action 26 confirmed
> Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: te_pseudo_action:actions.c 
> Pseudo action 23 confirmed
> Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: send_rsc_command:actions.c 
> Initiating action 20: IPaddr_4_start_0 on fhbmplb1
> Feb 15 15:05:18 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c 
> Performing op start on IPaddr_4 (interval=0ms, 
> key=4:a3a46044-31af-41f0-bf98-10646065e52d)
> Feb 15 15:05:18 fhbmplb1 cib: [15675]: info: write_cib_contents:io.c Wrote 
> version 0.45.11378 of the CIB to disk (digest: 
> 5c1745ebf93166ba3698f8b8a8ed07a4)
> Feb 15 15:05:18 fhbmplb1 IPaddr[15676]: [15735]: INFO: /sbin/ifconfig 
> eth0:0 10.4.5.32 netmask 255.255.255.0 broadcast 10.4.5.255
> Feb 15 15:05:18 fhbmplb1 IPaddr[15676]: [15742]: INFO: Sending Gratuitous 
> Arp for 10.4.5.32 on eth0:0 [eth0]
> Feb 15 15:05:18 fhbmplb1 IPaddr[15676]: [15743]: INFO: 
> /usr/lib64/heartbeat/send_arp -i 500 -r 10 -p 
> /var/run/heartbeat/rsctmp/send_arp/send_arp-10.4.5.32 eth0 10.4.5.32 auto 
> 10.4.5.32 ffffffffffff
> Feb 15 15:05:18 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c LRM 
> operation (23) start_0 on IPaddr_4 complete
> Feb 15 15:05:18 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c 
> Update (client: 17673, call:56): 0.45.11378 -> 0.45.11379 (ok)
> Feb 15 15:05:18 fhbmplb1 mgmtd: [17674]: debug: update cib finished
> Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: te_update_diff:callbacks.c 
> Processing diff (cib_update): 0.45.11378 -> 0.45.11379
> Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: match_graph_event:events.c 
> Action IPaddr_4_start_0 (20) confirmed
> Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: send_rsc_command:actions.c 
> Initiating action 6: IPaddr_4_monitor_5000 on fhbmplb1
> Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: send_rsc_command:actions.c 
> Initiating action 22: arpinglb_5_start_0 on fhbmplb1
> Feb 15 15:05:18 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c 
> Performing op monitor on IPaddr_4 (interval=5000ms, 
> key=4:a3a46044-31af-41f0-bf98-10646065e52d)
> Feb 15 15:05:18 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c 
> Performing op start on arpinglb_5 (interval=0ms, 
> key=4:a3a46044-31af-41f0-bf98-10646065e52d)
> Feb 15 15:05:18 fhbmplb1 lrmd: [15767]: WARN: For LSB init script, no 
> additional parameters are needed.
> Feb 15 15:05:18 fhbmplb1 lrmd: [17670]: info: RA output: 
> (arpinglb_5:start:stdout) Sending out ARP replies as 10.4.5.32
> Feb 15 15:05:18 fhbmplb1 cib: [15764]: info: write_cib_contents:io.c Wrote 
> version 0.45.11379 of the CIB to disk (digest: 
> deea946b175c6692b067f5ae289a5ceb)
> Feb 15 15:05:18 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c LRM 
> operation (24) monitor_5000 on IPaddr_4 complete
> Feb 15 15:05:18 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c 
> Update (client: 17673, call:57): 0.45.11379 -> 0.45.11380 (ok)
> Feb 15 15:05:18 fhbmplb1 mgmtd: [17674]: debug: update cib finished
> Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: te_update_diff:callbacks.c 
> Processing diff (cib_update): 0.45.11379 -> 0.45.11380
> Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: match_graph_event:events.c 
> Action IPaddr_4_monitor_5000 (6) confirmed
> Feb 15 15:05:18 fhbmplb1 cib: [15795]: info: write_cib_contents:io.c Wrote 
> version 0.45.11380 of the CIB to disk (digest: 
> 537fd9703bde42e997662c8ce17e2082)
> Feb 15 15:05:20 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c LRM 
> operation (25) start_0 on arpinglb_5 complete
> Feb 15 15:05:20 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c 
> Update (client: 17673, call:58): 0.45.11380 -> 0.45.11381 (ok)
> Feb 15 15:05:20 fhbmplb1 mgmtd: [17674]: debug: update cib finished
> Feb 15 15:05:20 fhbmplb1 tengine: [17686]: info: te_update_diff:callbacks.c 
> Processing diff (cib_update): 0.45.11380 -> 0.45.11381
> Feb 15 15:05:20 fhbmplb1 tengine: [17686]: info: match_graph_event:events.c 
> Action arpinglb_5_start_0 (22) confirmed
> Feb 15 15:05:20 fhbmplb1 tengine: [17686]: info: te_pseudo_action:actions.c 
> Pseudo action 24 confirmed
> Feb 15 15:05:20 fhbmplb1 tengine: [17686]: info: send_rsc_command:actions.c 
> Initiating action 1: arpinglb_5_monitor_120000 on fhbmplb1
> Feb 15 15:05:20 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c 
> Performing op monitor on arpinglb_5 (interval=120000ms, 
> key=4:a3a46044-31af-41f0-bf98-10646065e52d)
> Feb 15 15:05:20 fhbmplb1 cib: [15805]: info: write_cib_contents:io.c Wrote 
> version 0.45.11381 of the CIB to disk (digest: 
> b80af82378b5f39735f514e15b68d601)
> Feb 15 15:05:20 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c LRM 
> operation (26) monitor_120000 on arpinglb_5 complete
> Feb 15 15:05:20 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c 
> Update (client: 17673, call:59): 0.45.11381 -> 0.45.11382 (ok)
> Feb 15 15:05:20 fhbmplb1 mgmtd: [17674]: debug: update cib finished
> Feb 15 15:05:20 fhbmplb1 tengine: [17686]: info: te_update_diff:callbacks.c 
> Processing diff (cib_update): 0.45.11381 -> 0.45.11382
> Feb 15 15:05:20 fhbmplb1 tengine: [17686]: info: match_graph_event:events.c 
> Action arpinglb_5_monitor_120000 (1) confirmed
> Feb 15 15:05:20 fhbmplb1 cib: [15810]: info: write_cib_contents:io.c Wrote 
> version 0.45.11382 of the CIB to disk (digest: 
> 61ead707496c8b9e495030d8184edbf3)
> Feb 15 15:05:20 fhbmplb1 lrmd: [17670]: info: RA output: 
> (wsdepmgr_3:stop:stdout) ADMU0116I: Tool information is being logged in 
> file
> Feb 15 15:05:20 fhbmplb1 lrmd: [17670]: info: RA output: 
> (wsdepmgr_3:stop:stdout) 
> /opt/IBM/WebSphere/AppServer/profiles/Dmgr01/logs/dmgr/stopServer.log
> Feb 15 15:05:20 fhbmplb1 lrmd: [17670]: info: RA output: 
> (wsdepmgr_3:stop:stdout)
> Feb 15 15:05:21 fhbmplb1 lrmd: [17670]: info: RA output: 
> (wsdepmgr_3:stop:stdout) ADMU0128I: Starting tool with the Dmgr01 profile 
> ADMU3100I: Reading configuration for server: dmgr
> Feb 15 15:05:25 fhbmplb1 lrmd: [17670]: info: RA output: 
> (wsdepmgr_3:stop:stdout) ADMU3201I: Server stop request issued. Waiting for 
> stop status.
> Feb 15 15:05:39 fhbmplb1 lrmd: [17670]: info: RA output: 
> (wsdepmgr_3:stop:stdout) ADMU4000I: Server dmgr stop completed.
> Feb 15 15:05:39 fhbmplb1 lrmd: [17670]: info: RA output: 
> (wsdepmgr_3:stop:stdout)
> Feb 15 15:05:39 fhbmplb1 lrmd: [17670]: info: RA output: 
> (wsdepmgr_3:stop:stderr) /dev/sdb1: ce
> Feb 15 15:05:39 fhbmplb1 lrmd: [17670]: info: RA output: 
> (wsdepmgr_3:stop:stdout) 18107
> Feb 15 15:05:39 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c LRM 
> operation (18) stop_0 on wsdepmgr_3 complete
> Feb 15 15:05:39 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c 
> Update (client: 17673, call:60): 0.45.11382 -> 0.45.11383 (ok)
> Feb 15 15:05:39 fhbmplb1 mgmtd: [17674]: debug: update cib finished
> Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info: te_update_diff:callbacks.c 
> Processing diff (cib_update): 0.45.11382 -> 0.45.11383
> Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info: match_graph_event:events.c 
> Action wsdepmgr_3_stop_0 (14) confirmed
> Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info: send_rsc_command:actions.c 
> Initiating action 3: IPaddr_2_stop_0 on fhbmplb1
> Feb 15 15:05:39 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c 
> Performing op stop on IPaddr_2 (interval=0ms, 
> key=4:a3a46044-31af-41f0-bf98-10646065e52d)
> Feb 15 15:05:39 fhbmplb1 crmd: [17673]: WARN: process_lrm_event:lrm.c LRM 
> operation (14) monitor_5000 on IPaddr_2 Cancelled
> Feb 15 15:05:39 fhbmplb1 cib: [16198]: info: write_cib_contents:io.c Wrote 
> version 0.45.11383 of the CIB to disk (digest: 
> 5ae0d2f817797c6374f8708c68bfcee4)
> Feb 15 15:05:39 fhbmplb1 IPaddr[16199]: [16220]: INFO: /sbin/route -n del 
> -host 10.3.35.32
> Feb 15 15:05:39 fhbmplb1 lrmd: [17670]: info: RA output: 
> (IPaddr_2:stop:stderr) SIOCDELRT: No such process
> Feb 15 15:05:39 fhbmplb1 IPaddr[16199]: [16222]: INFO: /sbin/ifconfig 
> eth1:0 10.3.35.32 down
> Feb 15 15:05:39 fhbmplb1 IPaddr[16199]: [16225]: INFO: IP Address 
> 10.3.35.32 released
> Feb 15 15:05:39 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c LRM 
> operation (28) stop_0 on IPaddr_2 complete
> Feb 15 15:05:39 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c 
> Update (client: 17673, call:61): 0.45.11383 -> 0.45.11384 (ok)
> Feb 15 15:05:39 fhbmplb1 mgmtd: [17674]: debug: update cib finished
> Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info: te_update_diff:callbacks.c 
> Processing diff (cib_update): 0.45.11383 -> 0.45.11384
> Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info: match_graph_event:events.c 
> Action IPaddr_2_stop_0 (3) confirmed
> Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info: send_rsc_command:actions.c 
> Initiating action 13: IPaddr_2_start_0 on fhbmplb1
> Feb 15 15:05:39 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c 
> Performing op start on IPaddr_2 (interval=0ms, 
> key=4:a3a46044-31af-41f0-bf98-10646065e52d)
> Feb 15 15:05:39 fhbmplb1 cib: [16227]: info: write_cib_contents:io.c Wrote 
> version 0.45.11384 of the CIB to disk (digest: 
> 7902ae928cad52cab1de6effbdcb5de5)
> Feb 15 15:05:39 fhbmplb1 IPaddr[16226]: [16274]: INFO: /sbin/ifconfig 
> eth1:0 10.3.35.32 netmask 255.255.255.0 broadcast 10.3.35.255
> Feb 15 15:05:39 fhbmplb1 IPaddr[16226]: [16279]: INFO: Sending Gratuitous 
> Arp for 10.3.35.32 on eth1:0 [eth1]
> Feb 15 15:05:39 fhbmplb1 IPaddr[16226]: [16280]: INFO: 
> /usr/lib64/heartbeat/send_arp -i 500 -r 10 -p 
> /var/run/heartbeat/rsctmp/send_arp/send_arp-10.3.35.32 eth1 10.3.35.32 auto 
> 10.3.35.32 ffffffffffff
> Feb 15 15:05:39 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c LRM 
> operation (29) start_0 on IPaddr_2 complete
> Feb 15 15:05:39 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c 
> Update (client: 17673, call:62): 0.45.11384 -> 0.45.11385 (ok)
> Feb 15 15:05:39 fhbmplb1 mgmtd: [17674]: debug: update cib finished
> Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info: te_update_diff:callbacks.c 
> Processing diff (cib_update): 0.45.11384 -> 0.45.11385
> Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info: match_graph_event:events.c 
> Action IPaddr_2_start_0 (13) confirmed
> Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info: send_rsc_command:actions.c 
> Initiating action 2: IPaddr_2_monitor_5000 on fhbmplb1
> Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info: send_rsc_command:actions.c 
> Initiating action 15: wsdepmgr_3_start_0 on fhbmplb1
> Feb 15 15:05:39 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c 
> Performing op monitor on IPaddr_2 (interval=5000ms, 
> key=4:a3a46044-31af-41f0-bf98-10646065e52d)
> Feb 15 15:05:39 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c 
> Performing op start on wsdepmgr_3 (interval=0ms, 
> key=4:a3a46044-31af-41f0-bf98-10646065e52d)
> Feb 15 15:05:39 fhbmplb1 lrmd: [16291]: WARN: For LSB init script, no 
> additional parameters are needed.
> Feb 15 15:05:39 fhbmplb1 lrmd: [17670]: info: RA output: 
> (wsdepmgr_3:start:stdout) Sending out ARP replies as 10.3.5.32
> Feb 15 15:05:39 fhbmplb1 lrmd: [17670]: info: RA output: 
> (wsdepmgr_3:start:stderr) bind: Cannot assign requested address
> Feb 15 15:05:39 fhbmplb1 lrmd: [17670]: info: RA output: 
> (wsdepmgr_3:start:stdout) Starting deployment manager: dmgr
> Feb 15 15:05:39 fhbmplb1 cib: [16289]: info: write_cib_contents:io.c Wrote 
> version 0.45.11385 of the CIB to disk (digest: 
> 1bba9eb984b137be524e65dfc48ee4f9)
> Feb 15 15:05:39 fhbmplb1 su: (to wasadmin) root on none
> Feb 15 15:05:39 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c LRM 
> operation (30) monitor_5000 on IPaddr_2 complete
> Feb 15 15:05:39 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c 
> Update (client: 17673, call:63): 0.45.11385 -> 0.45.11386 (ok)
> Feb 15 15:05:39 fhbmplb1 mgmtd: [17674]: debug: update cib finished
> Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info: te_update_diff:callbacks.c 
> Processing diff (cib_update): 0.45.11385 -> 0.45.11386
> Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info: match_graph_event:events.c 
> Action IPaddr_2_monitor_5000 (2) confirmed
> Feb 15 15:05:39 fhbmplb1 cib: [16334]: info: write_cib_contents:io.c Wrote 
> version 0.45.11386 of the CIB to disk (digest: 
> a2e0897afb893ea208c6aa39fc6506cd)
> Feb 15 15:05:40 fhbmplb1 lrmd: [17670]: info: RA output: 
> (wsdepmgr_3:start:stdout) ADMU0116I: Tool information is being logged in 
> file
> Feb 15 15:05:40 fhbmplb1 lrmd: [17670]: info: RA output: 
> (wsdepmgr_3:start:stdout)
> Feb 15 15:05:40 fhbmplb1 lrmd: [17670]: info: RA output: 
> (wsdepmgr_3:start:stdout) 
> /opt/IBM/WebSphere/AppServer/profiles/Dmgr01/logs/dmgr/startServer.log
> Feb 15 15:05:41 fhbmplb1 lrmd: [17670]: info: RA output: 
> (wsdepmgr_3:start:stdout) ADMU0128I: Starting tool with the Dmgr01 profile
> Feb 15 15:05:41 fhbmplb1 lrmd: [17670]: info: RA output: 
> (wsdepmgr_3:start:stdout) ADMU3100I: Reading configuration for server: dmgr
> Feb 15 15:05:41 fhbmplb1 lrmd: [17670]: info: RA output: 
> (wsdepmgr_3:start:stdout)
> Feb 15 15:05:44 fhbmplb1 lrmd: [17670]: info: RA output: 
> (wsdepmgr_3:start:stdout) ADMU3200I: Server launched. Waiting for 
> initialization status.
> Feb 15 15:06:12 fhbmplb1 lrmd: [17670]: info: RA output: 
> (wsdepmgr_3:start:stdout) ADMU3000I: Server dmgr open for e-business; 
> process id is 16401
> Feb 15 15:06:12 fhbmplb1 lrmd: [17670]: info: RA output: 
> (wsdepmgr_3:start:stdout)
> Feb 15 15:06:12 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c LRM 
> operation (31) start_0 on wsdepmgr_3 complete
> Feb 15 15:06:12 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c 
> Update (client: 17673, call:64): 0.45.11386 -> 0.45.11387 (ok)
> Feb 15 15:06:12 fhbmplb1 mgmtd: [17674]: debug: update cib finished
> Feb 15 15:06:12 fhbmplb1 tengine: [17686]: info: te_update_diff:callbacks.c 
> Processing diff (cib_update): 0.45.11386 -> 0.45.11387
> Feb 15 15:06:12 fhbmplb1 tengine: [17686]: info: match_graph_event:events.c 
> Action wsdepmgr_3_start_0 (15) confirmed
> Feb 15 15:06:12 fhbmplb1 tengine: [17686]: info: te_pseudo_action:actions.c 
> Pseudo action 17 confirmed
> Feb 15 15:06:12 fhbmplb1 tengine: [17686]: info: send_rsc_command:actions.c 
> Initiating action 4: wsdepmgr_3_monitor_300000 on fhbmplb1
> Feb 15 15:06:12 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c 
> Performing op monitor on wsdepmgr_3 (interval=300000ms, 
> key=4:a3a46044-31af-41f0-bf98-10646065e52d)
> Feb 15 15:06:12 fhbmplb1 cib: [16811]: info: write_cib_contents:io.c Wrote 
> version 0.45.11387 of the CIB to disk (digest: 
> d3c40a0cd875c87af78426d115d3615f)
> Feb 15 15:06:21 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c LRM 
> operation (32) monitor_300000 on wsdepmgr_3 complete
> Feb 15 15:06:21 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c 
> Update (client: 17673, call:65): 0.45.11387 -> 0.45.11388 (ok)
> Feb 15 15:06:21 fhbmplb1 mgmtd: [17674]: debug: update cib finished
> Feb 15 15:06:21 fhbmplb1 tengine: [17686]: info: te_update_diff:callbacks.c 
> Processing diff (cib_update): 0.45.11387 -> 0.45.11388
> Feb 15 15:06:21 fhbmplb1 tengine: [17686]: info: match_graph_event:events.c 
> Action wsdepmgr_3_monitor_300000 (4) confirmed
> Feb 15 15:06:21 fhbmplb1 tengine: [17686]: info: run_graph:graph.c 
> Transition 4: (Complete=20, Pending=0, Fired=0, Skipped=0, Incomplete=0)
> Feb 15 15:06:21 fhbmplb1 tengine: [17686]: info: notify_crmd:actions.c 
> Transition 4 status: te_complete - (null)
> Feb 15 15:06:21 fhbmplb1 crmd: [17673]: info: do_state_transition:fsa.c 
> fhbmplb1: State transition S_TRANSITION_ENGINE -> S_IDLE [ 
> input=I_TE_SUCCESS cause=C_IPC_MESSAGE origin=do_msg_route ]
> Feb 15 15:06:21 fhbmplb1 cib: [17102]: info: write_cib_contents:io.c Wrote 
> version 0.45.11388 of the CIB to disk (digest: 
> b68ab97e9a799fef743e211d7b8625ff)
>
> but wsdepmgr_3 and arpinglb_5 are also brought down and up.
>
> There are two things I do not understand:
> 1. why does the timeout occur on the IPaddr resources (they occur 
> sporadically without any relation to the network load)

Really hard to say why. But you should increase the timeout. Five
seconds is too short.

> 2. if the IPaddr resource can (and intended to) be recovered by crm, why 
> the other resources in the resource group are also
> cycled.
>
> The cluster version is: 2.0.5-7.10

You should definitely upgrade. For this version you won't get
much support.

Thanks,

Dejan
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to