[Linux-HA] heartbeat dying

Gary Schlachter Fri, 11 Jan 2008 07:24:00 -0800

I have a problem with heartbeat dying. I have a 3 node clusterrunning HA 2.0.8 on Fedora Core 1. They are providing a single IPaddress resource. They are using eth0 as the heartbeat mechanism. If Idisconnect the eth0 cable from the node which is providing the IPaddress, one of the other nodes correctly begins providing it. However,shortly after disconnecting the eth0 cable, the heartbeat process (andothers) die. The key area in the ha-debug log looks like the following:

pengine[4293]: 2008/01/11_09:50:22 info: determine_online_status: Nodeloneranger.us.big.net is onlinepengine[4293]: 2008/01/11_09:50:22 info: native_print: SharedIP(heartbeat::ocf:IPaddr): Started loneranger.us.big.netpengine[4293]: 2008/01/11_09:50:22 notice: StopRsc:loneranger.us.big.net Stop SharedIPcrmd[9543]: 2008/01/11_09:50:22 info: do_state_transition:loneranger.us.big.net: State transition S_POLICY_ENGINE->S_TRANSITION_ENGINE [input=I_PE_SUCCESS cause=C_IPC_MESSAGEorigin=route_message ]pengine[4293]: 2008/01/11_09:50:22 info: process_pe_message: Transition0: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-137.bz2tengine[4292]: 2008/01/11_09:50:22 info: unpack_graph: Unpackedtransition 0: 1 actions in 1 synapsestengine[4292]: 2008/01/11_09:50:22 info: send_rsc_command: Initiatingaction 3: SharedIP_stop_0 on loneranger.us.big.netcrmd[9543]: 2008/01/11_09:50:22 info: do_lrm_rsc_op: Performingop=SharedIP_stop_0 key=3:0:994066a9-4cae-49a4-abad-37f3e0b84b3e)IPaddr[4300]: 2008/01/11_09:50:22 INFO: /sbin/ifconfig eth0:010.1.2.50 downlrmd[9540]: 2008/01/11_09:50:22 info: RA output: (SharedIP:stop:stderr)SIOCDELRT: No such process

crmd[9543]: 2008/01/11_09:50:22 info: process_lrm_event: LRM operationSharedIP_stop_0 (call=4, rc=0) completecib[9539]: 2008/01/11_09:50:22 info: cib_diff_notify: Update (client:9543, call:32): 0.30.317 -> 0.30.318 (ok)cib[4315]: 2008/01/11_09:50:22 info: write_cib_contents: Wrote version0.30.318 of the CIB to disk (digest: ad7329b3cddc6a9bbd96deb332a3d08f)tengine[4292]: 2008/01/11_09:50:22 info: te_update_diff: Processing diff(cib_update): 0.30.317 -> 0.30.318tengine[4292]: 2008/01/11_09:50:22 info: match_graph_event: ActionSharedIP_stop_0 (3) confirmed on c8608d41-66b2-4115-9043-4a8423b0d562tengine[4292]: 2008/01/11_09:50:22 info: run_graph: Transition 0:(Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0)tengine[4292]: 2008/01/11_09:50:22 info: notify_crmd: Transition 0status: te_complete - <null>crmd[9543]: 2008/01/11_09:50:22 info: do_state_transition:loneranger.us.big.net: State transition S_TRANSITION_ENGINE -> S_IDLE [input=I_TE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ]heartbeat[9527]: 2008/01/11_09:54:27 ERROR: Cannot write to media pipe0: Resource temporarily unavailable

heartbeat[9527]: 2008/01/11_09:54:27 ERROR: Shutting down.

heartbeat[9527]: 2008/01/11_09:54:27 ERROR: Cannot write to media pipe0: Resource temporarily unavailable

heartbeat[9527]: 2008/01/11_09:54:27 ERROR: Shutting down.

heartbeat[9527]: 2008/01/11_09:54:27 ERROR: Cannot write to media pipe0: Resource temporarily unavailable

heartbeat[9527]: 2008/01/11_09:54:27 ERROR: Shutting down.

The last messages repeat for a very long time then most daemonseventually stop.



_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] heartbeat dying

Reply via email to