Hi,
I am having troubles with a heartbeat cluster.
1. Occasionally, I get timeouts on the IPaddr monitors (I have two IP
addresses aliased on a single Ethernet)
Feb 15 15:05:16 fhbmplb1 lrmd: [17670]: WARN: on_op_timeout_expired:
TIMEOUT: operation monitor[14] on ocf::IPaddr::IPaddr_2 for client
17673, its parameters: ip=[10.3.35.32] CRM_meta_op_target_rc=[7]
netmask=[24] crm_feature_set=[1.0.6] .
Feb 15 15:05:16 fhbmplb1 lrmd: [17670]: WARN: on_op_timeout_expired:
TIMEOUT: operation monitor[11] on ocf::IPaddr::IPaddr_4 for client
17673, its parameters: ip=[10.4.5.32] CRM_meta_op_target_rc=[7]
netmask=[24] crm_feature_set=[1.0.6] .
2. When this occurs, the crm gives me info that some of the resources
are to be recovered:
Feb 15 15:05:18 fhbmplb1 pengine: [17687]: info: Resource Group: group_1
Feb 15 15:05:18 fhbmplb1 pengine: [17687]: info: Filesystem_1
(heartbeat::ocf:Filesystem): Started fhbmplb1
Feb 15 15:05:18 fhbmplb1 pengine: [17687]: info: IPaddr_2
(heartbeat::ocf:IPaddr): Started fhbmplb1 FAILED
Feb 15 15:05:18 fhbmplb1 pengine: [17687]: info: wsdepmgr_3
(lsb:wsdepmgr): Started fhbmplb1
Feb 15 15:05:18 fhbmplb1 pengine: [17687]: info: Resource Group: group_2
Feb 15 15:05:18 fhbmplb1 pengine: [17687]: info: IPaddr_4
(heartbeat::ocf:IPaddr): Started fhbmplb1 FAILED
Feb 15 15:05:18 fhbmplb1 pengine: [17687]: info: arpinglb_5
(lsb:arpinglb): Started fhbmplb1
Feb 15 15:05:18 fhbmplb1 pengine: [17687]: notice: NoRoleChange:native.c
Leave resource Filesystem_1 (fhbmplb1)
Feb 15 15:05:18 fhbmplb1 pengine: [17687]: notice: NoRoleChange:native.c
Recover resource IPaddr_2 (fhbmplb1)
Feb 15 15:05:18 fhbmplb1 pengine: [17687]: notice: Recurring:native.c
fhbmplb1 IPaddr_2_monitor_5000
Feb 15 15:05:18 fhbmplb1 pengine: [17687]: notice: NoRoleChange:native.c
Leave resource wsdepmgr_3 (fhbmplb1)
Feb 15 15:05:18 fhbmplb1 pengine: [17687]: notice: NoRoleChange:native.c
Recover resource IPaddr_4 (fhbmplb1)
Feb 15 15:05:18 fhbmplb1 pengine: [17687]: notice: Recurring:native.c
fhbmplb1 IPaddr_4_monitor_5000
Feb 15 15:05:18 fhbmplb1 pengine: [17687]: notice: NoRoleChange:native.c
Leave resource arpinglb_5 (fhbmplb1)
In fact, IPaddr_2 and IPaddr_4 are both being recovered:
Feb 15 15:05:18 fhbmplb1 pengine: [17687]: notice: stage8:stages.c
Created transition graph 4.
Feb 15 15:05:18 fhbmplb1 pengine: [17687]: WARN:
process_pe_message:pengine.c No value specified for cluster preference:
pe-input-series-max
Feb 15 15:05:18 fhbmplb1 pengine: [17687]: info:
process_pe_message:pengine.c Transition 4: PEngine Input stored in:
/var/lib/heartbeat/pengine/pe-input-263.bz2
Feb 15 15:05:18 fhbmplb1 crmd: [17673]: info: do_state_transition:fsa.c
fhbmplb1: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [
input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=do_msg_route ]
Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info: unpack_graph:unpack.c
Unpacked transition 4: 20 actions in 20 synapses
Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info:
te_pseudo_action:actions.c Pseudo action 18 confirmed
Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info:
te_pseudo_action:actions.c Pseudo action 19 confirmed
Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info:
send_rsc_command:actions.c Initiating action 14: wsdepmgr_3_stop_0 on
fhbmplb1
Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info:
te_pseudo_action:actions.c Pseudo action 25 confirmed
Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info:
send_rsc_command:actions.c Initiating action 21: arpinglb_5_stop_0 on
fhbmplb1
Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info:
te_pseudo_action:actions.c Pseudo action 16 confirmed
Feb 15 15:05:18 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c
Performing op stop on wsdepmgr_3 (interval=0ms,
key=4:a3a46044-31af-41f0-bf98-10646065e52d)
Feb 15 15:05:18 fhbmplb1 lrmd: [15641]: WARN: For LSB init script, no
additional parameters are needed.
Feb 15 15:05:18 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c
Performing op stop on arpinglb_5 (interval=0ms,
key=4:a3a46044-31af-41f0-bf98-10646065e52d)
Feb 15 15:05:18 fhbmplb1 lrmd: [15642]: WARN: For LSB init script, no
additional parameters are needed.
Feb 15 15:05:18 fhbmplb1 crmd: [17673]: WARN: process_lrm_event:lrm.c
LRM operation (16) monitor_300000 on wsdepmgr_3 Cancelled
Feb 15 15:05:18 fhbmplb1 crmd: [17673]: WARN: process_lrm_event:lrm.c
LRM operation (13) monitor_120000 on arpinglb_5 Cancelled
Feb 15 15:05:18 fhbmplb1 lrmd: [17670]: info: RA output:
(wsdepmgr_3:stop:stdout) Ending deployment manager: dmgr
Feb 15 15:05:18 fhbmplb1 lrmd: [17670]: info: RA output:
(arpinglb_5:stop:stdout) Nothing to do
Feb 15 15:05:18 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c
LRM operation (20) stop_0 on arpinglb_5 complete
Feb 15 15:05:18 fhbmplb1 cib: [15640]: info: write_cib_contents:io.c
Wrote version 0.45.11376 of the CIB to disk (digest:
3f402f15495cf45733fb61e9210aa2ca)
Feb 15 15:05:18 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c
Update (client: 17673, call:54): 0.45.11376 -> 0.45.11377 (ok)
Feb 15 15:05:18 fhbmplb1 mgmtd: [17674]: debug: update cib finished
Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info:
te_update_diff:callbacks.c Processing diff (cib_update): 0.45.11376 ->
0.45.11377
Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info:
match_graph_event:events.c Action arpinglb_5_stop_0 (21) confirmed
Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info:
send_rsc_command:actions.c Initiating action 7: IPaddr_4_stop_0 on fhbmplb1
Feb 15 15:05:18 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c
Performing op stop on IPaddr_4 (interval=0ms,
key=4:a3a46044-31af-41f0-bf98-10646065e52d)
Feb 15 15:05:18 fhbmplb1 crmd: [17673]: WARN: process_lrm_event:lrm.c
LRM operation (11) monitor_5000 on IPaddr_4 Cancelled
Feb 15 15:05:18 fhbmplb1 cib: [15646]: info: write_cib_contents:io.c
Wrote version 0.45.11377 of the CIB to disk (digest:
8ae4066a99a414fee984b14a19153b4d)
Feb 15 15:05:18 fhbmplb1 IPaddr[15648]: [15669]: INFO: /sbin/route -n
del -host 10.4.5.32~
Feb 15 15:05:18 fhbmplb1 lrmd: [17670]: info: RA output:
(IPaddr_4:stop:stderr) SIOCDELRT: No such process
Feb 15 15:05:18 fhbmplb1 IPaddr[15648]: [15671]: INFO: /sbin/ifconfig
eth0:0 10.4.5.32 downc
Feb 15 15:05:18 fhbmplb1 IPaddr[15648]: [15674]: INFO: IP Address
10.4.5.32 released‡
Feb 15 15:05:18 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c
LRM operation (22) stop_0 on IPaddr_4 complete
Feb 15 15:05:18 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c
Update (client: 17673, call:55): 0.45.11377 -> 0.45.11378 (ok)
Feb 15 15:05:18 fhbmplb1 mgmtd: [17674]: debug: update cib finished
Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info:
te_update_diff:callbacks.c Processing diff (cib_update): 0.45.11377 ->
0.45.11378
Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info:
match_graph_event:events.c Action IPaddr_4_stop_0 (7) confirmed
Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info:
te_pseudo_action:actions.c Pseudo action 26 confirmed
Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info:
te_pseudo_action:actions.c Pseudo action 23 confirmed
Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info:
send_rsc_command:actions.c Initiating action 20: IPaddr_4_start_0 on
fhbmplb1
Feb 15 15:05:18 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c
Performing op start on IPaddr_4 (interval=0ms,
key=4:a3a46044-31af-41f0-bf98-10646065e52d)
Feb 15 15:05:18 fhbmplb1 cib: [15675]: info: write_cib_contents:io.c
Wrote version 0.45.11378 of the CIB to disk (digest:
5c1745ebf93166ba3698f8b8a8ed07a4)
Feb 15 15:05:18 fhbmplb1 IPaddr[15676]: [15735]: INFO: /sbin/ifconfig
eth0:0 10.4.5.32 netmask 255.255.255.0 broadcast 10.4.5.255
Feb 15 15:05:18 fhbmplb1 IPaddr[15676]: [15742]: INFO: Sending
Gratuitous Arp for 10.4.5.32 on eth0:0 [eth0]
Feb 15 15:05:18 fhbmplb1 IPaddr[15676]: [15743]: INFO:
/usr/lib64/heartbeat/send_arp -i 500 -r 10 -p
/var/run/heartbeat/rsctmp/send_arp/send_arp-10.4.5.32 eth0 10.4.5.32
auto 10.4.5.32 ffffffffffff
Feb 15 15:05:18 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c
LRM operation (23) start_0 on IPaddr_4 complete
Feb 15 15:05:18 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c
Update (client: 17673, call:56): 0.45.11378 -> 0.45.11379 (ok)
Feb 15 15:05:18 fhbmplb1 mgmtd: [17674]: debug: update cib finished
Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info:
te_update_diff:callbacks.c Processing diff (cib_update): 0.45.11378 ->
0.45.11379
Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info:
match_graph_event:events.c Action IPaddr_4_start_0 (20) confirmed
Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info:
send_rsc_command:actions.c Initiating action 6: IPaddr_4_monitor_5000 on
fhbmplb1
Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info:
send_rsc_command:actions.c Initiating action 22: arpinglb_5_start_0 on
fhbmplb1
Feb 15 15:05:18 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c
Performing op monitor on IPaddr_4 (interval=5000ms,
key=4:a3a46044-31af-41f0-bf98-10646065e52d)
Feb 15 15:05:18 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c
Performing op start on arpinglb_5 (interval=0ms,
key=4:a3a46044-31af-41f0-bf98-10646065e52d)
Feb 15 15:05:18 fhbmplb1 lrmd: [15767]: WARN: For LSB init script, no
additional parameters are needed.
Feb 15 15:05:18 fhbmplb1 lrmd: [17670]: info: RA output:
(arpinglb_5:start:stdout) Sending out ARP replies as 10.4.5.32
Feb 15 15:05:18 fhbmplb1 cib: [15764]: info: write_cib_contents:io.c
Wrote version 0.45.11379 of the CIB to disk (digest:
deea946b175c6692b067f5ae289a5ceb)
Feb 15 15:05:18 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c
LRM operation (24) monitor_5000 on IPaddr_4 complete
Feb 15 15:05:18 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c
Update (client: 17673, call:57): 0.45.11379 -> 0.45.11380 (ok)
Feb 15 15:05:18 fhbmplb1 mgmtd: [17674]: debug: update cib finished
Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info:
te_update_diff:callbacks.c Processing diff (cib_update): 0.45.11379 ->
0.45.11380
Feb 15 15:05:18 fhbmplb1 tengine: [17686]: info:
match_graph_event:events.c Action IPaddr_4_monitor_5000 (6) confirmed
Feb 15 15:05:18 fhbmplb1 cib: [15795]: info: write_cib_contents:io.c
Wrote version 0.45.11380 of the CIB to disk (digest:
537fd9703bde42e997662c8ce17e2082)
Feb 15 15:05:20 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c
LRM operation (25) start_0 on arpinglb_5 complete
Feb 15 15:05:20 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c
Update (client: 17673, call:58): 0.45.11380 -> 0.45.11381 (ok)
Feb 15 15:05:20 fhbmplb1 mgmtd: [17674]: debug: update cib finished
Feb 15 15:05:20 fhbmplb1 tengine: [17686]: info:
te_update_diff:callbacks.c Processing diff (cib_update): 0.45.11380 ->
0.45.11381
Feb 15 15:05:20 fhbmplb1 tengine: [17686]: info:
match_graph_event:events.c Action arpinglb_5_start_0 (22) confirmed
Feb 15 15:05:20 fhbmplb1 tengine: [17686]: info:
te_pseudo_action:actions.c Pseudo action 24 confirmed
Feb 15 15:05:20 fhbmplb1 tengine: [17686]: info:
send_rsc_command:actions.c Initiating action 1:
arpinglb_5_monitor_120000 on fhbmplb1
Feb 15 15:05:20 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c
Performing op monitor on arpinglb_5 (interval=120000ms,
key=4:a3a46044-31af-41f0-bf98-10646065e52d)
Feb 15 15:05:20 fhbmplb1 cib: [15805]: info: write_cib_contents:io.c
Wrote version 0.45.11381 of the CIB to disk (digest:
b80af82378b5f39735f514e15b68d601)
Feb 15 15:05:20 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c
LRM operation (26) monitor_120000 on arpinglb_5 complete
Feb 15 15:05:20 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c
Update (client: 17673, call:59): 0.45.11381 -> 0.45.11382 (ok)
Feb 15 15:05:20 fhbmplb1 mgmtd: [17674]: debug: update cib finished
Feb 15 15:05:20 fhbmplb1 tengine: [17686]: info:
te_update_diff:callbacks.c Processing diff (cib_update): 0.45.11381 ->
0.45.11382
Feb 15 15:05:20 fhbmplb1 tengine: [17686]: info:
match_graph_event:events.c Action arpinglb_5_monitor_120000 (1) confirmed
Feb 15 15:05:20 fhbmplb1 cib: [15810]: info: write_cib_contents:io.c
Wrote version 0.45.11382 of the CIB to disk (digest:
61ead707496c8b9e495030d8184edbf3)
Feb 15 15:05:20 fhbmplb1 lrmd: [17670]: info: RA output:
(wsdepmgr_3:stop:stdout) ADMU0116I: Tool information is being logged in file
Feb 15 15:05:20 fhbmplb1 lrmd: [17670]: info: RA output:
(wsdepmgr_3:stop:stdout)
/opt/IBM/WebSphere/AppServer/profiles/Dmgr01/logs/dmgr/stopServer.log
Feb 15 15:05:20 fhbmplb1 lrmd: [17670]: info: RA output:
(wsdepmgr_3:stop:stdout)
Feb 15 15:05:21 fhbmplb1 lrmd: [17670]: info: RA output:
(wsdepmgr_3:stop:stdout) ADMU0128I: Starting tool with the Dmgr01
profile ADMU3100I: Reading configuration for server: dmgr
Feb 15 15:05:25 fhbmplb1 lrmd: [17670]: info: RA output:
(wsdepmgr_3:stop:stdout) ADMU3201I: Server stop request issued. Waiting
for stop status.
Feb 15 15:05:39 fhbmplb1 lrmd: [17670]: info: RA output:
(wsdepmgr_3:stop:stdout) ADMU4000I: Server dmgr stop completed.
Feb 15 15:05:39 fhbmplb1 lrmd: [17670]: info: RA output:
(wsdepmgr_3:stop:stdout)
Feb 15 15:05:39 fhbmplb1 lrmd: [17670]: info: RA output:
(wsdepmgr_3:stop:stderr) /dev/sdb1: ce
Feb 15 15:05:39 fhbmplb1 lrmd: [17670]: info: RA output:
(wsdepmgr_3:stop:stdout) 18107
Feb 15 15:05:39 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c
LRM operation (18) stop_0 on wsdepmgr_3 complete
Feb 15 15:05:39 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c
Update (client: 17673, call:60): 0.45.11382 -> 0.45.11383 (ok)
Feb 15 15:05:39 fhbmplb1 mgmtd: [17674]: debug: update cib finished
Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info:
te_update_diff:callbacks.c Processing diff (cib_update): 0.45.11382 ->
0.45.11383
Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info:
match_graph_event:events.c Action wsdepmgr_3_stop_0 (14) confirmed
Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info:
send_rsc_command:actions.c Initiating action 3: IPaddr_2_stop_0 on fhbmplb1
Feb 15 15:05:39 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c
Performing op stop on IPaddr_2 (interval=0ms,
key=4:a3a46044-31af-41f0-bf98-10646065e52d)
Feb 15 15:05:39 fhbmplb1 crmd: [17673]: WARN: process_lrm_event:lrm.c
LRM operation (14) monitor_5000 on IPaddr_2 Cancelled
Feb 15 15:05:39 fhbmplb1 cib: [16198]: info: write_cib_contents:io.c
Wrote version 0.45.11383 of the CIB to disk (digest:
5ae0d2f817797c6374f8708c68bfcee4)
Feb 15 15:05:39 fhbmplb1 IPaddr[16199]: [16220]: INFO: /sbin/route -n
del -host 10.3.35.32
Feb 15 15:05:39 fhbmplb1 lrmd: [17670]: info: RA output:
(IPaddr_2:stop:stderr) SIOCDELRT: No such process
Feb 15 15:05:39 fhbmplb1 IPaddr[16199]: [16222]: INFO: /sbin/ifconfig
eth1:0 10.3.35.32 down
Feb 15 15:05:39 fhbmplb1 IPaddr[16199]: [16225]: INFO: IP Address
10.3.35.32 released
Feb 15 15:05:39 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c
LRM operation (28) stop_0 on IPaddr_2 complete
Feb 15 15:05:39 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c
Update (client: 17673, call:61): 0.45.11383 -> 0.45.11384 (ok)
Feb 15 15:05:39 fhbmplb1 mgmtd: [17674]: debug: update cib finished
Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info:
te_update_diff:callbacks.c Processing diff (cib_update): 0.45.11383 ->
0.45.11384
Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info:
match_graph_event:events.c Action IPaddr_2_stop_0 (3) confirmed
Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info:
send_rsc_command:actions.c Initiating action 13: IPaddr_2_start_0 on
fhbmplb1
Feb 15 15:05:39 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c
Performing op start on IPaddr_2 (interval=0ms,
key=4:a3a46044-31af-41f0-bf98-10646065e52d)
Feb 15 15:05:39 fhbmplb1 cib: [16227]: info: write_cib_contents:io.c
Wrote version 0.45.11384 of the CIB to disk (digest:
7902ae928cad52cab1de6effbdcb5de5)
Feb 15 15:05:39 fhbmplb1 IPaddr[16226]: [16274]: INFO: /sbin/ifconfig
eth1:0 10.3.35.32 netmask 255.255.255.0 broadcast 10.3.35.255
Feb 15 15:05:39 fhbmplb1 IPaddr[16226]: [16279]: INFO: Sending
Gratuitous Arp for 10.3.35.32 on eth1:0 [eth1]
Feb 15 15:05:39 fhbmplb1 IPaddr[16226]: [16280]: INFO:
/usr/lib64/heartbeat/send_arp -i 500 -r 10 -p
/var/run/heartbeat/rsctmp/send_arp/send_arp-10.3.35.32 eth1 10.3.35.32
auto 10.3.35.32 ffffffffffff
Feb 15 15:05:39 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c
LRM operation (29) start_0 on IPaddr_2 complete
Feb 15 15:05:39 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c
Update (client: 17673, call:62): 0.45.11384 -> 0.45.11385 (ok)
Feb 15 15:05:39 fhbmplb1 mgmtd: [17674]: debug: update cib finished
Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info:
te_update_diff:callbacks.c Processing diff (cib_update): 0.45.11384 ->
0.45.11385
Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info:
match_graph_event:events.c Action IPaddr_2_start_0 (13) confirmed
Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info:
send_rsc_command:actions.c Initiating action 2: IPaddr_2_monitor_5000 on
fhbmplb1
Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info:
send_rsc_command:actions.c Initiating action 15: wsdepmgr_3_start_0 on
fhbmplb1
Feb 15 15:05:39 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c
Performing op monitor on IPaddr_2 (interval=5000ms,
key=4:a3a46044-31af-41f0-bf98-10646065e52d)
Feb 15 15:05:39 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c
Performing op start on wsdepmgr_3 (interval=0ms,
key=4:a3a46044-31af-41f0-bf98-10646065e52d)
Feb 15 15:05:39 fhbmplb1 lrmd: [16291]: WARN: For LSB init script, no
additional parameters are needed.
Feb 15 15:05:39 fhbmplb1 lrmd: [17670]: info: RA output:
(wsdepmgr_3:start:stdout) Sending out ARP replies as 10.3.5.32
Feb 15 15:05:39 fhbmplb1 lrmd: [17670]: info: RA output:
(wsdepmgr_3:start:stderr) bind: Cannot assign requested address
Feb 15 15:05:39 fhbmplb1 lrmd: [17670]: info: RA output:
(wsdepmgr_3:start:stdout) Starting deployment manager: dmgr
Feb 15 15:05:39 fhbmplb1 cib: [16289]: info: write_cib_contents:io.c
Wrote version 0.45.11385 of the CIB to disk (digest:
1bba9eb984b137be524e65dfc48ee4f9)
Feb 15 15:05:39 fhbmplb1 su: (to wasadmin) root on none
Feb 15 15:05:39 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c
LRM operation (30) monitor_5000 on IPaddr_2 complete
Feb 15 15:05:39 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c
Update (client: 17673, call:63): 0.45.11385 -> 0.45.11386 (ok)
Feb 15 15:05:39 fhbmplb1 mgmtd: [17674]: debug: update cib finished
Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info:
te_update_diff:callbacks.c Processing diff (cib_update): 0.45.11385 ->
0.45.11386
Feb 15 15:05:39 fhbmplb1 tengine: [17686]: info:
match_graph_event:events.c Action IPaddr_2_monitor_5000 (2) confirmed
Feb 15 15:05:39 fhbmplb1 cib: [16334]: info: write_cib_contents:io.c
Wrote version 0.45.11386 of the CIB to disk (digest:
a2e0897afb893ea208c6aa39fc6506cd)
Feb 15 15:05:40 fhbmplb1 lrmd: [17670]: info: RA output:
(wsdepmgr_3:start:stdout) ADMU0116I: Tool information is being logged in
file
Feb 15 15:05:40 fhbmplb1 lrmd: [17670]: info: RA output:
(wsdepmgr_3:start:stdout)
Feb 15 15:05:40 fhbmplb1 lrmd: [17670]: info: RA output:
(wsdepmgr_3:start:stdout)
/opt/IBM/WebSphere/AppServer/profiles/Dmgr01/logs/dmgr/startServer.log
Feb 15 15:05:41 fhbmplb1 lrmd: [17670]: info: RA output:
(wsdepmgr_3:start:stdout) ADMU0128I: Starting tool with the Dmgr01 profile
Feb 15 15:05:41 fhbmplb1 lrmd: [17670]: info: RA output:
(wsdepmgr_3:start:stdout) ADMU3100I: Reading configuration for server: dmgr
Feb 15 15:05:41 fhbmplb1 lrmd: [17670]: info: RA output:
(wsdepmgr_3:start:stdout)
Feb 15 15:05:44 fhbmplb1 lrmd: [17670]: info: RA output:
(wsdepmgr_3:start:stdout) ADMU3200I: Server launched. Waiting for
initialization status.
Feb 15 15:06:12 fhbmplb1 lrmd: [17670]: info: RA output:
(wsdepmgr_3:start:stdout) ADMU3000I: Server dmgr open for e-business;
process id is 16401
Feb 15 15:06:12 fhbmplb1 lrmd: [17670]: info: RA output:
(wsdepmgr_3:start:stdout)
Feb 15 15:06:12 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c
LRM operation (31) start_0 on wsdepmgr_3 complete
Feb 15 15:06:12 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c
Update (client: 17673, call:64): 0.45.11386 -> 0.45.11387 (ok)
Feb 15 15:06:12 fhbmplb1 mgmtd: [17674]: debug: update cib finished
Feb 15 15:06:12 fhbmplb1 tengine: [17686]: info:
te_update_diff:callbacks.c Processing diff (cib_update): 0.45.11386 ->
0.45.11387
Feb 15 15:06:12 fhbmplb1 tengine: [17686]: info:
match_graph_event:events.c Action wsdepmgr_3_start_0 (15) confirmed
Feb 15 15:06:12 fhbmplb1 tengine: [17686]: info:
te_pseudo_action:actions.c Pseudo action 17 confirmed
Feb 15 15:06:12 fhbmplb1 tengine: [17686]: info:
send_rsc_command:actions.c Initiating action 4:
wsdepmgr_3_monitor_300000 on fhbmplb1
Feb 15 15:06:12 fhbmplb1 crmd: [17673]: info: do_lrm_rsc_op:lrm.c
Performing op monitor on wsdepmgr_3 (interval=300000ms,
key=4:a3a46044-31af-41f0-bf98-10646065e52d)
Feb 15 15:06:12 fhbmplb1 cib: [16811]: info: write_cib_contents:io.c
Wrote version 0.45.11387 of the CIB to disk (digest:
d3c40a0cd875c87af78426d115d3615f)
Feb 15 15:06:21 fhbmplb1 crmd: [17673]: info: process_lrm_event:lrm.c
LRM operation (32) monitor_300000 on wsdepmgr_3 complete
Feb 15 15:06:21 fhbmplb1 cib: [17669]: info: cib_diff_notify:notify.c
Update (client: 17673, call:65): 0.45.11387 -> 0.45.11388 (ok)
Feb 15 15:06:21 fhbmplb1 mgmtd: [17674]: debug: update cib finished
Feb 15 15:06:21 fhbmplb1 tengine: [17686]: info:
te_update_diff:callbacks.c Processing diff (cib_update): 0.45.11387 ->
0.45.11388
Feb 15 15:06:21 fhbmplb1 tengine: [17686]: info:
match_graph_event:events.c Action wsdepmgr_3_monitor_300000 (4) confirmed
Feb 15 15:06:21 fhbmplb1 tengine: [17686]: info: run_graph:graph.c
Transition 4: (Complete=20, Pending=0, Fired=0, Skipped=0, Incomplete=0)
Feb 15 15:06:21 fhbmplb1 tengine: [17686]: info: notify_crmd:actions.c
Transition 4 status: te_complete - (null)
Feb 15 15:06:21 fhbmplb1 crmd: [17673]: info: do_state_transition:fsa.c
fhbmplb1: State transition S_TRANSITION_ENGINE -> S_IDLE [
input=I_TE_SUCCESS cause=C_IPC_MESSAGE origin=do_msg_route ]
Feb 15 15:06:21 fhbmplb1 cib: [17102]: info: write_cib_contents:io.c
Wrote version 0.45.11388 of the CIB to disk (digest:
b68ab97e9a799fef743e211d7b8625ff)
but wsdepmgr_3 and arpinglb_5 are also brought down and up.
There are two things I do not understand:
1. why does the timeout occur on the IPaddr resources (they occur
sporadically without any relation to the network load)
2. if the IPaddr resource can (and intended to) be recovered by crm, why
the other resources in the resource group are also
cycled.
The cluster version is: 2.0.5-7.10
The cib.xml is attached.
Thanks in advance for any help,
Imre
<cib generated="true" admin_epoch="0" have_quorum="true" num_peers="2" cib_feature_revision="1.3" epoch="47" num_updates="11585" cib-last-written="Mon Feb 23 11:42:11 2009" crm-debug-origin="create_node_entry" ccm_transition="2" dc_uuid="c42562a1-fbc4-4180-a3c2-41752bbe56d5">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<attributes>
<nvpair id="cib-bootstrap-options-symmetric_cluster" name="symmetric_cluster" value="true"/>
<nvpair id="cib-bootstrap-options-no_quorum_policy" name="no_quorum_policy" value="stop"/>
<nvpair id="cib-bootstrap-options-default_resource_stickiness" name="default_resource_stickiness" value="0"/>
<nvpair id="cib-bootstrap-options-stonith_enabled" name="stonith_enabled" value="false"/>
<nvpair id="cib-bootstrap-options-stop_orphan_resources" name="stop_orphan_resources" value="true"/>
<nvpair id="cib-bootstrap-options-stop_orphan_actions" name="stop_orphan_actions" value="true"/>
<nvpair id="cib-bootstrap-options-remove_after_stop" name="remove_after_stop" value="false"/>
<nvpair id="cib-bootstrap-options-transition_idle_timeout" name="transition_idle_timeout" value="5min"/>
<nvpair id="cib-bootstrap-options-is_managed_default" name="is_managed_default" value="true"/>
</attributes>
</cluster_property_set>
</crm_config>
<nodes>
<node id="c42562a1-fbc4-4180-a3c2-41752bbe56d5" uname="fhbmplb1" type="normal"/>
<node id="5c447dff-4b89-422b-92f6-2985663eb5c4" uname="fhbmplb2" type="normal"/>
</nodes>
<resources>
<group id="group_1">
<instance_attributes id="group_1">
<attributes>
<nvpair name="resource_stickiness" id="group_1-resource_stickiness" value="INFINITY"/>
<nvpair id="group_1-resource" name="resource" value="INFINITY"/>
</attributes>
</instance_attributes>
<primitive class="ocf" id="Filesystem_1" provider="heartbeat" type="Filesystem">
<operations>
<op id="Filesystem_1_mon" interval="120s" name="monitor" timeout="60s"/>
</operations>
<instance_attributes id="Filesystem_1_inst_attr">
<attributes>
<nvpair id="Filesystem_1_attr_0" name="device" value="/dev/sdb1"/>
<nvpair id="Filesystem_1_attr_1" name="directory" value="/opt/IBM"/>
<nvpair id="Filesystem_1_attr_2" name="fstype" value="ext3"/>
</attributes>
</instance_attributes>
</primitive>
<primitive class="ocf" id="IPaddr_2" provider="heartbeat" type="IPaddr">
<operations>
<op id="IPaddr_2_mon" interval="5s" name="monitor" timeout="5s"/>
</operations>
<instance_attributes id="IPaddr_2_inst_attr">
<attributes>
<nvpair id="IPaddr_2_attr_0" name="ip" value="10.3.35.32"/>
<nvpair id="IPaddr_2_attr_1" name="netmask" value="24"/>
</attributes>
</instance_attributes>
</primitive>
<primitive class="lsb" id="wsdepmgr_3" provider="heartbeat" type="wsdepmgr">
<operations>
<op id="wsdepmgr_3_stop" name="stop" timeout="120s"/>
<op id="wsdepmgr_3_start" name="start" timeout="120s"/>
<op id="wsdepmgr_3_mon" interval="300s" name="monitor" timeout="120s"/>
</operations>
</primitive>
</group>
<group id="group_2">
<instance_attributes id="group_2">
<attributes>
<nvpair name="resource_stickiness" id="group_2-resource_stickiness" value="INFINITY"/>
<nvpair id="group_2-resource" name="resource" value="INFINITY"/>
</attributes>
</instance_attributes>
<primitive class="ocf" id="IPaddr_4" provider="heartbeat" type="IPaddr">
<operations>
<op id="IPaddr_4_mon" interval="5s" name="monitor" timeout="5s"/>
</operations>
<instance_attributes id="IPaddr_4_inst_attr">
<attributes>
<nvpair id="IPaddr_4_attr_0" name="ip" value="10.4.5.32"/>
<nvpair id="IPaddr_4_attr_1" name="netmask" value="24"/>
</attributes>
</instance_attributes>
</primitive>
<primitive class="lsb" id="arpinglb_5" provider="heartbeat" type="arpinglb">
<operations>
<op id="arpinglb_5_mon" interval="120s" name="monitor" timeout="60s"/>
</operations>
</primitive>
</group>
</resources>
<constraints>
<rsc_location id="rsc_location_group_1" rsc="group_1">
<rule id="prefered_location_group_1" score="100">
<expression attribute="#uname" id="prefered_location_group_1_expr" operation="eq" value="fhbmpswapp1"/>
</rule>
</rsc_location>
<rsc_location id="rsc_location_group_2" rsc="group_2">
<rule id="prefered_location_group_2" score="100">
<expression attribute="#uname" id="prefered_location_group_2_expr" operation="eq" value="fhbmpswapp1"/>
</rule>
</rsc_location>
<rsc_order id="order_IPaddr_2_Filesystem_1" from="IPaddr_2" action="start" type="after" to="Filesystem_1"/>
<rsc_order id="order_wsdepmgr_3_IPaddr_2" from="wsdepmgr_3" action="start" type="after" to="IPaddr_2"/>
</constraints>
</configuration>
</cib>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems