Ciao, I just opened a bugzilla as you suggested, hope everything is fine.
thanks cristina On Aug 17, 2009, at 11:47 PM, Dejan Muhamedagic wrote:
Hi, On Mon, Aug 17, 2009 at 03:16:16PM +0200, Cristina Bulfon wrote:Ciao,during my holiday I had problem with heartbeat. Checking the log file Ifind the following line - On ACTIVE machineheartbeat[27868]: 2009/07/19_08:33:43 info: Daily informational memorystatisticsheartbeat[27868]: 2009/07/19_08:33:43 info: MSG stats: 7/663233 ms age 0[pid27868/MST_CONTROL] heartbeat[27868]: 2009/07/19_08:33:43 info: cl_malloc stats: 0/0 0/0 [pid27868/MST_CONTROL] heartbeat[27868]: 2009/07/19_08:33:43 info: RealMalloc stats: 0 total malloc bytes. pid [27868/MST_CONTROL] heartbeat[27868]: 2009/07/19_08:33:43 info: Current arena value: 0 ...................... heartbeat[27868]: 2009/07/25_08:34:02 info: MSG stats: 0/0 ms age 5158152440 [pid27875/HBREAD] heartbeat[27868]: 2009/07/25_08:34:02 info: cl_malloc stats: 0/0 0/0 [pid27875/HBREAD] heartbeat[27868]: 2009/07/25_08:34:02 info: RealMalloc stats: 0 total malloc bytes. pid [27875/HBREAD] heartbeat[27868]: 2009/07/25_08:34:02 info: Current arena value: 0heartbeat[27868]: 2009/07/25_08:34:02 info: These are nothing to worryabout.lrmd[27881]: 2009/07/26_02:29:11 WARN: G_SIG_dispatch: Dispatch function for SIGCHLD was delayed 1000 ms (> 100 ms) before being called (GSource: 0x519bc0)lrmd[27881]: 2009/07/26_02:29:11 info: G_SIG_dispatch: started at 522265947should have started at 522265847 ........ heartbeat[27868]: 2009/08/01_08:34:23 info: MSG stats: 0/0 ms age 5762952450 [pid27875/HBREAD] heartbeat[27868]: 2009/08/01_08:34:23 info: cl_malloc stats: 0/0 0/0 [pid27875/HBREAD] heartbeat[27868]: 2009/08/01_08:34:23 info: RealMalloc stats: 0 total malloc bytes. pid [27875/HBREAD] heartbeat[27868]: 2009/08/01_08:34:23 info: Current arena value: 0heartbeat[27868]: 2009/08/01_08:34:23 info: These are nothing to worryabout. - On STANDBY machineheartbeat[27879]: 2009/07/19_08:34:13 info: Daily informational memorystatisticsheartbeat[27879]: 2009/07/19_08:34:13 info: MSG stats: 9/663189 ms age 0[pid27879/MST_CONTROL] heartbeat[27879]: 2009/07/19_08:34:13 info: cl_malloc stats: 0/0 0/0 [pid27879/MST_CONTROL] heartbeat[27879]: 2009/07/19_08:34:13 info: RealMalloc stats: 0 total malloc bytes. pid [27879/MST_CONTROL] heartbeat[27879]: 2009/07/19_08:34:13 info: Current arena value: 0 ............................ heartbeat[27879]: 2009/07/25_08:34:21 info: MSG stats: 0/0 ms age 5158086760 [pid27886/HBREAD] heartbeat[27879]: 2009/07/25_08:34:21 info: cl_malloc stats: 0/0 0/0 [pid27886/HBREAD] heartbeat[27879]: 2009/07/25_08:34:21 info: RealMalloc stats: 0 total malloc bytes. pid [27886/HBREAD] heartbeat[27879]: 2009/07/25_08:34:21 info: Current arena value: 0heartbeat[27879]: 2009/07/25_08:34:21 info: These are nothing to worryabout. ............................heartbeat[27879]: 2009/08/01_09:11:07 WARN: node afsitfs3.roma1.infn.it: isdead heartbeat[27879]: 2009/08/01_09:11:07 info: Link afsitfs3.roma1.infn.it:eth1 dead.pingd[27889]: 2009/08/01_09:11:07 notice: pingd_nstatus_callback: Statusupdate: Ping node afsitfs3.roma1.infn.it now has statu s [dead] heartbeat[27879]: 2009/08/01_09:11:07 info: Link afsitfs3.roma1.infn.it:eth0 dead.pingd[27889]: 2009/08/01_09:11:07 notice: pingd_lstatus_callback: Statusupdate: Ping node afsitfs3.roma1.infn.it now has statu s [dead]pingd[27889]: 2009/08/01_09:11:07 notice: pingd_nstatus_callback: Statusupdate: Ping node afsitfs3.roma1.infn.it now has statu s [dead]crmd[27895]: 2009/08/01_09:11:07 notice: crmd_ha_status_callback: Statusupdate: Node afsitfs3.roma1.infn.it now has status [de ad]pingd[27889]: 2009/08/01_09:11:07 notice: pingd_lstatus_callback: Statusupdate: Ping node afsitfs3.roma1.infn.it now has statu s [dead]pingd[27889]: 2009/08/01_09:11:07 notice: pingd_nstatus_callback: Statusupdate: Ping node afsitfs3.roma1.infn.it now has statu s [dead] cib[27891]: 2009/08/01_09:11:07 info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccmcib[27891]: 2009/08/01_09:11:07 info: mem_handle_event: no mbr_track infocib[27891]: 2009/08/01_09:11:07 info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm cib[27891]: 2009/08/01_09:11:07 info: mem_handle_event: instance=3, nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3 ccm[27890]: 2009/08/01_09:11:07 info: Break tie for 2 nodes cluster crmd[27895]: 2009/08/01_09:11:07 info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm cib[27891]: 2009/08/01_09:11:07 info: cib_ccm_msg_callback: LOST: afsitfs3.roma1.infn.itcrmd[27895]: 2009/08/01_09:11:07 info: mem_handle_event: no mbr_track infocib[27891]: 2009/08/01_09:11:07 info: cib_ccm_msg_callback: PEER: afsitfs4.roma1.infn.it crmd[27895]: 2009/08/01_09:11:07 info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm crmd[27895]: 2009/08/01_09:11:07 info: mem_handle_event: instance=3, nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3 crmd[27895]: 2009/08/01_09:11:07 info: crmd_ccm_msg_callback: Quorum (re)attained after event=NEW MEMBERSHIP (id=3)crmd[27895]: 2009/08/01_09:11:07 info: erase_node_from_join: Removed deadnode afsitfs3.roma1.infn.it from join calculations: w elcomed=0 itegrated=0 finalized=0 confirmed=1crmd[27895]: 2009/08/01_09:11:07 info: ccm_event_detail: NEW MEMBERSHIP:trans=3, nodes=1, new=0, lost=1 n_idx=0, new_idx=1, ol d_idx=3crmd[27895]: 2009/08/01_09:11:07 info: ccm_event_detail: CURRENT:afsitfs4.roma1.infn.it [nodeid=1, born=3] crmd[27895]: 2009/08/01_09:11:07 info: ccm_event_detail: LOST: afsitfs3.roma1.infn.it [nodeid=0, born=2]tengine[27909]: 2009/08/01_09:11:07 WARN: match_down_event: No match forshutdown action on 586817af-703a-4eff-ac9b-b96de063493 atengine[27909]: 2009/08/01_09:11:07 info: extract_event: Stonith/ shutdownof 586817af-703a-4eff-ac9b-b96de063493a not matched crmd[27895]: 2009/08/01_09:11:07 info: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause= C_IPC_MESSAGE origin=route_message ]tengine[27909]: 2009/08/01_09:11:07 info: update_abort_priority: Abortpriority upgraded to 1000000crmd[27895]: 2009/08/01_09:11:07 info: do_state_transition: All 1 clusternodes are eligible to run resources.pengine[27910]: 2009/08/01_09:11:07 info: determine_online_status: Nodeafsitfs4.roma1.infn.it is onlinepengine[27910]: 2009/08/01_09:11:07 notice: group_print: Resource Group:group_1 pengine[27910]: 2009/08/01_09:11:07 notice: native_print: IPaddr_141_108_26_27 (ocf::heartbeat:IPaddr): Stoppedpengine[27910]: 2009/08/01_09:11:07 notice: native_print: Filesystem_2(ocf::heartbeat:Filesystem): Stoppedpengine[27910]: 2009/08/01_09:11:07 notice: native_print: Filesystem_3(ocf::heartbeat:Filesystem): Stoppedpengine[27910]: 2009/08/01_09:11:07 notice: native_print: Filesystem_4(ocf::heartbeat:Filesystem): Stoppedpengine[27910]: 2009/08/01_09:11:07 notice: native_print: Filesystem_5(ocf::heartbeat:Filesystem): Stoppedpengine[27910]: 2009/08/01_09:11:07 notice: native_print: Filesystem_6(ocf::heartbeat:Filesystem): Stoppedpengine[27910]: 2009/08/01_09:11:07 notice: native_print: Filesystem_7(ocf::heartbeat:Filesystem): Stopped pengine[27910]: 2009/08/01_09:11:07 notice: native_print: afs_8 (lsb:afs): Stoppedpengine[27910]: 2009/08/01_09:11:07 notice: native_print: CheckSun_9(stonith:external/sbd): Stopped pengine[27910]: 2009/08/01_09:11:07 WARN: native_color: Resource IPaddr_141_108_26_27 cannot run anywhere pengine[27910]: 2009/08/01_09:11:07 WARN: native_color: Resource Filesystem_2 cannot run anywhere pengine[27910]: 2009/08/01_09:11:07 WARN: native_color: Resource Filesystem_3 cannot run anywhere pengine[27910]: 2009/08/01_09:11:07 WARN: native_color: Resource Filesystem_4 cannot run anywhere pengine[27910]: 2009/08/01_09:11:07 WARN: native_color: Resource Filesystem_5 cannot run anywhere pengine[27910]: 2009/08/01_09:11:07 WARN: native_color: Resource Filesystem_6 cannot run anywhere pengine[27910]: 2009/08/01_09:11:07 WARN: native_color: Resource Filesystem_7 cannot run anywherepengine[27910]: 2009/08/01_09:11:07 WARN: native_color: Resource afs_8cannot run anywherepengine[27910]: 2009/08/01_09:11:07 WARN: native_color: Resource CheckSun_9cannot run anywhere crmd[27895]: 2009/08/01_09:11:07 info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_P E_SUCCESS cause=C_IPC_MESSAGE origin=route_message ]tengine[27909]: 2009/08/01_09:11:07 info: process_te_message: Processinggraph derived from /var/lib/heartbeat/pengine/pe-warn- 3617.bz2tengine[27909]: 2009/08/01_09:11:07 info: unpack_graph: Unpacked transition3: 0 actions in 0 synapses tengine[27909]: 2009/08/01_09:11:07 info: run_graph: Transition 3: (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0) crmd[27895]: 2009/08/01_09:11:07 info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ]tengine[27909]: 2009/08/01_09:11:07 info: notify_crmd: Transition 3 status:te_complete - <null>pengine[27910]: 2009/08/01_09:11:07 WARN: process_pe_message: Transition 3:WARNINGs found during PE processing. PEngine Input stored in: /var/lib/heartbeat/pengine/pe-warn-3617.bz2pengine[27910]: 2009/08/01_09:11:07 info: process_pe_message: ConfigurationWARNINGs found during PE processing. Please run "c rm_verify -L" to identify issues.cib[27891]: 2009/08/01_09:14:36 info: cib_stats: Processed 3 operations(13333.00us average, 0% utilization) in the last 10minI don't understand which was the problem on ACTIVE machine and I don'tIt was not reachable. It's not clear why though and this node can't say why.understand either the reason why the service didn't start on STANDBY machine.Not sure. It's as if there was no stonith resource defined. It looks like you have shared storage, so you should have fencing setup, but one can't say without looking at the configuration. At any rate, the CRM didn't even try to fence the other node.Anybody can help to debug and solve the problem ?You have to prepare a report with hb_report. If it's too big, don't post it but open a new bugzilla. Thanks, Dejanthanks cristina_______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems_______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
