[Linux-HA] help to debug

Cristina Bulfon Mon, 17 Aug 2009 06:16:43 -0700

Ciao,

during my holiday I had problem with heartbeat. Checking the log file I find the following line


- On ACTIVE machine

heartbeat[27868]: 2009/07/19_08:33:43 info: Daily informational memory statistics heartbeat[27868]: 2009/07/19_08:33:43 info: MSG stats: 7/663233 ms age 0 [pid27868/MST_CONTROL] heartbeat[27868]: 2009/07/19_08:33:43 info: cl_malloc stats: 0/0 0/0 [pid27868/MST_CONTROL] heartbeat[27868]: 2009/07/19_08:33:43 info: RealMalloc stats: 0 total malloc bytes. pid [27868/MST_CONTROL]

heartbeat[27868]: 2009/07/19_08:33:43 info: Current arena value: 0
......................

heartbeat[27868]: 2009/07/25_08:34:02 info: MSG stats: 0/0 ms age 5158152440 [pid27875/HBREAD] heartbeat[27868]: 2009/07/25_08:34:02 info: cl_malloc stats: 0/0 0/0 [pid27875/HBREAD] heartbeat[27868]: 2009/07/25_08:34:02 info: RealMalloc stats: 0 total malloc bytes. pid [27875/HBREAD]

heartbeat[27868]: 2009/07/25_08:34:02 info: Current arena value: 0

heartbeat[27868]: 2009/07/25_08:34:02 info: These are nothing to worry about. lrmd[27881]: 2009/07/26_02:29:11 WARN: G_SIG_dispatch: Dispatch function for SIGCHLD was delayed 1000 ms (> 100 ms) before being called (GSource: 0x

519bc0)

lrmd[27881]: 2009/07/26_02:29:11 info: G_SIG_dispatch: started at 522265947 should have started at 522265847

........

heartbeat[27868]: 2009/08/01_08:34:23 info: MSG stats: 0/0 ms age 5762952450 [pid27875/HBREAD] heartbeat[27868]: 2009/08/01_08:34:23 info: cl_malloc stats: 0/0 0/0 [pid27875/HBREAD] heartbeat[27868]: 2009/08/01_08:34:23 info: RealMalloc stats: 0 total malloc bytes. pid [27875/HBREAD]

heartbeat[27868]: 2009/08/01_08:34:23 info: Current arena value: 0

heartbeat[27868]: 2009/08/01_08:34:23 info: These are nothing to worry about.


- On STANDBY machine

heartbeat[27879]: 2009/07/19_08:34:13 info: Daily informational memory statistics heartbeat[27879]: 2009/07/19_08:34:13 info: MSG stats: 9/663189 ms age 0 [pid27879/MST_CONTROL] heartbeat[27879]: 2009/07/19_08:34:13 info: cl_malloc stats: 0/0 0/0 [pid27879/MST_CONTROL] heartbeat[27879]: 2009/07/19_08:34:13 info: RealMalloc stats: 0 total malloc bytes. pid [27879/MST_CONTROL]

heartbeat[27879]: 2009/07/19_08:34:13 info: Current arena value: 0
............................

heartbeat[27879]: 2009/07/25_08:34:21 info: MSG stats: 0/0 ms age 5158086760 [pid27886/HBREAD] heartbeat[27879]: 2009/07/25_08:34:21 info: cl_malloc stats: 0/0 0/0 [pid27886/HBREAD] heartbeat[27879]: 2009/07/25_08:34:21 info: RealMalloc stats: 0 total malloc bytes. pid [27886/HBREAD]

heartbeat[27879]: 2009/07/25_08:34:21 info: Current arena value: 0

heartbeat[27879]: 2009/07/25_08:34:21 info: These are nothing to worry about.

............................

heartbeat[27879]: 2009/08/01_09:11:07 WARN: node afsitfs3.roma1.infn.it: is dead heartbeat[27879]: 2009/08/01_09:11:07 info: Link afsitfs3.roma1.infn.it:eth1 dead. pingd[27889]: 2009/08/01_09:11:07 notice: pingd_nstatus_callback: Status update: Ping node afsitfs3.roma1.infn.it now has statu

s [dead]

heartbeat[27879]: 2009/08/01_09:11:07 info: Link afsitfs3.roma1.infn.it:eth0 dead. pingd[27889]: 2009/08/01_09:11:07 notice: pingd_lstatus_callback: Status update: Ping node afsitfs3.roma1.infn.it now has statu

s [dead]

pingd[27889]: 2009/08/01_09:11:07 notice: pingd_nstatus_callback: Status update: Ping node afsitfs3.roma1.infn.it now has statu

s [dead]

crmd[27895]: 2009/08/01_09:11:07 notice: crmd_ha_status_callback: Status update: Node afsitfs3.roma1.infn.it now has status [de

ad]

pingd[27889]: 2009/08/01_09:11:07 notice: pingd_lstatus_callback: Status update: Ping node afsitfs3.roma1.infn.it now has statu

s [dead]

pingd[27889]: 2009/08/01_09:11:07 notice: pingd_nstatus_callback: Status update: Ping node afsitfs3.roma1.infn.it now has statu

s [dead]

cib[27891]: 2009/08/01_09:11:07 info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm cib[27891]: 2009/08/01_09:11:07 info: mem_handle_event: no mbr_track info cib[27891]: 2009/08/01_09:11:07 info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm cib[27891]: 2009/08/01_09:11:07 info: mem_handle_event: instance=3, nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3

ccm[27890]: 2009/08/01_09:11:07 info: Break tie for 2 nodes cluster

crmd[27895]: 2009/08/01_09:11:07 info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm cib[27891]: 2009/08/01_09:11:07 info: cib_ccm_msg_callback: LOST: afsitfs3.roma1.infn.it crmd[27895]: 2009/08/01_09:11:07 info: mem_handle_event: no mbr_track info cib[27891]: 2009/08/01_09:11:07 info: cib_ccm_msg_callback: PEER: afsitfs4.roma1.infn.it crmd[27895]: 2009/08/01_09:11:07 info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm crmd[27895]: 2009/08/01_09:11:07 info: mem_handle_event: instance=3, nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3 crmd[27895]: 2009/08/01_09:11:07 info: crmd_ccm_msg_callback: Quorum (re)attained after event=NEW MEMBERSHIP (id=3) crmd[27895]: 2009/08/01_09:11:07 info: erase_node_from_join: Removed dead node afsitfs3.roma1.infn.it from join calculations: w

elcomed=0 itegrated=0 finalized=0 confirmed=1

crmd[27895]: 2009/08/01_09:11:07 info: ccm_event_detail: NEW MEMBERSHIP: trans=3, nodes=1, new=0, lost=1 n_idx=0, new_idx=1, ol

d_idx=3

crmd[27895]: 2009/08/01_09:11:07 info: ccm_event_detail: CURRENT: afsitfs4.roma1.infn.it [nodeid=1, born=3] crmd[27895]: 2009/08/01_09:11:07 info: ccm_event_detail: LOST: afsitfs3.roma1.infn.it [nodeid=0, born=2] tengine[27909]: 2009/08/01_09:11:07 WARN: match_down_event: No match for shutdown action on 586817af-703a-4eff-ac9b-b96de063493

tengine[27909]: 2009/08/01_09:11:07 info: extract_event: Stonith/ shutdown of 586817af-703a-4eff-ac9b-b96de063493a not matched crmd[27895]: 2009/08/01_09:11:07 info: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=

C_IPC_MESSAGE origin=route_message ]

tengine[27909]: 2009/08/01_09:11:07 info: update_abort_priority: Abort priority upgraded to 1000000 crmd[27895]: 2009/08/01_09:11:07 info: do_state_transition: All 1 cluster nodes are eligible to run resources. pengine[27910]: 2009/08/01_09:11:07 info: determine_online_status: Node afsitfs4.roma1.infn.it is online pengine[27910]: 2009/08/01_09:11:07 notice: group_print: Resource Group: group_1 pengine[27910]: 2009/08/01_09:11:07 notice: native_print: IPaddr_141_108_26_27 (ocf::heartbeat:IPaddr): Stopped

pengine[27910]: 2009/08/01_09:11:07 notice: native_print: Filesystem_2 (ocf::heartbeat:Filesystem): Stopped pengine[27910]: 2009/08/01_09:11:07 notice: native_print: Filesystem_3 (ocf::heartbeat:Filesystem): Stopped pengine[27910]: 2009/08/01_09:11:07 notice: native_print: Filesystem_4 (ocf::heartbeat:Filesystem): Stopped pengine[27910]: 2009/08/01_09:11:07 notice: native_print: Filesystem_5 (ocf::heartbeat:Filesystem): Stopped pengine[27910]: 2009/08/01_09:11:07 notice: native_print: Filesystem_6 (ocf::heartbeat:Filesystem): Stopped pengine[27910]: 2009/08/01_09:11:07 notice: native_print: Filesystem_7 (ocf::heartbeat:Filesystem): Stopped pengine[27910]: 2009/08/01_09:11:07 notice: native_print: afs_8 (lsb:afs): Stopped pengine[27910]: 2009/08/01_09:11:07 notice: native_print: CheckSun_9 (stonith:external/sbd): Stopped pengine[27910]: 2009/08/01_09:11:07 WARN: native_color: Resource IPaddr_141_108_26_27 cannot run anywhere pengine[27910]: 2009/08/01_09:11:07 WARN: native_color: Resource Filesystem_2 cannot run anywhere pengine[27910]: 2009/08/01_09:11:07 WARN: native_color: Resource Filesystem_3 cannot run anywhere pengine[27910]: 2009/08/01_09:11:07 WARN: native_color: Resource Filesystem_4 cannot run anywhere pengine[27910]: 2009/08/01_09:11:07 WARN: native_color: Resource Filesystem_5 cannot run anywhere pengine[27910]: 2009/08/01_09:11:07 WARN: native_color: Resource Filesystem_6 cannot run anywhere pengine[27910]: 2009/08/01_09:11:07 WARN: native_color: Resource Filesystem_7 cannot run anywhere pengine[27910]: 2009/08/01_09:11:07 WARN: native_color: Resource afs_8 cannot run anywhere pengine[27910]: 2009/08/01_09:11:07 WARN: native_color: Resource CheckSun_9 cannot run anywhere crmd[27895]: 2009/08/01_09:11:07 info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_P

E_SUCCESS cause=C_IPC_MESSAGE origin=route_message ]

tengine[27909]: 2009/08/01_09:11:07 info: process_te_message: Processing graph derived from /var/lib/heartbeat/pengine/pe-warn-

3617.bz2

tengine[27909]: 2009/08/01_09:11:07 info: unpack_graph: Unpacked transition 3: 0 actions in 0 synapses tengine[27909]: 2009/08/01_09:11:07 info: run_graph: Transition 3: (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0) crmd[27895]: 2009/08/01_09:11:07 info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS

 cause=C_IPC_MESSAGE origin=route_message ]

tengine[27909]: 2009/08/01_09:11:07 info: notify_crmd: Transition 3 status: te_complete - <null> pengine[27910]: 2009/08/01_09:11:07 WARN: process_pe_message: Transition 3: WARNINGs found during PE processing. PEngine Input

stored in: /var/lib/heartbeat/pengine/pe-warn-3617.bz2

pengine[27910]: 2009/08/01_09:11:07 info: process_pe_message: Configuration WARNINGs found during PE processing. Please run "c

rm_verify -L" to identify issues.

cib[27891]: 2009/08/01_09:14:36 info: cib_stats: Processed 3 operations (13333.00us average, 0% utilization) in the last 10min

I don't understand which was the problem on ACTIVE machine and I don't understand either the reason why the service didn't start on STANDBY machine.


Anybody can help to debug and solve the problem ?

thanks

cristina

smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] help to debug

Reply via email to