Hi,

We have a two nodes setup running heartbeat version 2.0.8-1. On one node,
heartbeat exited saying Emergency Shutdown. It was restarted. After the
restart, the heartbeat on the other node exited giving roughly the same
reason. Can someone please help us identify the issue. If these are known
bugs and if those bugs have been fixed in later releases?

Any help would be greatly appreciated.

The nodes configuration:

sh-3.00# uname -a
Linux S-FL2-PLS-NAC 2.6.17-1.2142_FC4smp #1 SMP Sat Aug 12 08:16:08 EDT 2006
i686 i686 i386 GNU/Linux

Following are the logs from the first node:

Mar  3 14:47:05 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Message hist queue
is filling up (197 messages in queue)
Mar  3 14:47:05 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Message hist queue
is filling up (198 messages in queue)
Mar  3 14:47:06 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Message hist queue
is filling up (199 messages in queue)
Mar  3 14:47:06 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Message hist queue
is filling up (200 messages in queue)
Mar  3 14:47:10 S-FL2-PLS-NAC last message repeated 7 times
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit pkt 7
for s-fl2-sls-nac.yardi.com: seqno too low
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode =
s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information:
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =207,
lowseq=7,ackseq=0,lastmsg=6
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit pkt 7
for s-fl2-sls-nac.yardi.com: seqno too low
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode =
s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information:
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =207,
lowseq=7,ackseq=0,lastmsg=6
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Message hist queue
is filling up (200 messages in queue)
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit pkt 8
for s-fl2-sls-nac.yardi.com: seqno too low
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode =
s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information:
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =208,
lowseq=8,ackseq=0,lastmsg=7
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit pkt 8
for s-fl2-sls-nac.yardi.com: seqno too low
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode =
s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information:
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =208,
lowseq=8,ackseq=0,lastmsg=7
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit pkt 8
for s-fl2-sls-nac.yardi.com: seqno too low
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode =
s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information:
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =208,
lowseq=8,ackseq=0,lastmsg=7
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit pkt 8
for s-fl2-sls-nac.yardi.com: seqno too low
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode =
s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information:
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =208,
lowseq=8,ackseq=0,lastmsg=7
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: lowseq cannnot be
greater than ackseq
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist->ackseq =10,
old_ackseq=0
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist->lowseq =201,
hist->hiseq=208, send_cluster_msg_level=0
Mar  3 14:47:10 S-FL2-PLS-NAC ccm: [5284]: ERROR: Lost connection to
heartbeat service. Need to bail out.
Mar  3 14:47:10 S-FL2-PLS-NAC cib: [5285]: ERROR: cib_ha_connection_destroy:
Heartbeat connection lost!  Exiting.
Mar  3 14:47:10 S-FL2-PLS-NAC stonithd: [5287]: ERROR: Disconnected with
heartbeat daemon
Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: CRIT: crmd_ha_msg_dispatch: Lost
connection to heartbeat service.
Mar  3 14:47:10 S-FL2-PLS-NAC mgmtd: [5290]: ERROR: Lost connection to
heartbeat service.
Mar  3 14:47:10 S-FL2-PLS-NAC stonithd: [5287]: notice:
/usr/lib/heartbeat/stonithd normally quit.
Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: mem_handle_func:IPC
broken, ccm is dead before the client!
Mar  3 14:47:10 S-FL2-PLS-NAC attrd: [5288]: CRIT: attrd_ha_dispatch: Lost
connection to heartbeat service.
Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: ERROR: ccm_dispatch: CCM
connection appears to have failed: rc=-1.
Mar  3 14:47:10 S-FL2-PLS-NAC attrd: [5288]: CRIT:
attrd_ha_connection_destroy: Lost connection to heartbeat service!
Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: ERROR: do_log: [[FSA]] Input
I_ERROR from ccm_dispatch() received in state (S_PENDING)
Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_state_transition:
s-fl2-pls-nac.yardi.com: State transition S_PENDING -> S_RECOVERY [
input=I_ERROR cause=C_CCM_CALLBACK origin=ccm_dispatch ]
Mar  3 14:47:10 S-FL2-PLS-NAC cib: [5285]: info: uninitializeCib: The CIB
has been deallocated.
Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: ERROR: do_recover: Action
A_RECOVER (0000000001000000) not supported
Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: ERROR: do_log: [[FSA]] Input
I_STOP from do_recover() received in state (S_RECOVERY)
Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_state_transition:
s-fl2-pls-nac.yardi.com: State transition S_RECOVERY -> S_STOPPING [
input=I_STOP cause=C_FSA_INTERNAL origin=do_recover ]
Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_dc_release: DC role
released
Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: WARN: do_log: [[FSA]] Input
I_RELEASE_SUCCESS from do_dc_release() received in state (S_STOPPING)
Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_state_transition:
s-fl2-pls-nac.yardi.com: State transition S_STOPPING -> S_TERMINATE [
input=I_TERMINATE cause=C_FSA_INTERNAL origin=do_shutdown ]
Mar  3 14:47:10 S-FL2-PLS-NAC attrd: [5288]: ERROR: cib_native_msgready:
Message pending on command channel [5285]
Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: verify_stopped: Checking
for active resources before exit
Mar  3 14:47:10 S-FL2-PLS-NAC attrd: [5288]: ERROR: crm_log_message_adv:
#========= cib:cmd message start ==========#
Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: verify_stopped: Checking
for active resources before exit
Mar  3 14:47:10 S-FL2-PLS-NAC attrd: [5288]: ERROR: MSG: No message to dump
Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_lrm_control:
Disconnected from the LRM
Mar  3 14:47:10 S-FL2-PLS-NAC mgmtd: [5290]: ERROR: cib_native_msgready:
Message pending on command channel [5285]
Mar  3 14:47:10 S-FL2-PLS-NAC attrd: [5288]: info: cib_native_msgready: Lost
connection to the CIB service [5285].
Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_ha_control:
Disconnected from Heartbeat
Mar  3 14:47:10 S-FL2-PLS-NAC mgmtd: [5290]: ERROR: crm_log_message_adv:
#========= cib:cmd message start ==========#
Mar  3 14:47:10 S-FL2-PLS-NAC attrd: [5288]: CRIT: cib_native_dispatch: Lost
connection to the CIB service [5285/callback].
Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_cib_control:
Disconnecting CIB
Mar  3 14:47:10 S-FL2-PLS-NAC mgmtd: [5290]: ERROR: MSG: No message to dump
Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info:
crmd_cib_connection_destroy: Connection to the CIB terminated...
Mar  3 14:47:10 S-FL2-PLS-NAC mgmtd: [5290]: CRIT: cib_native_dispatch: Lost
connection to the CIB service [5285/callback].
Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_exit: Performing
A_EXIT_0 - gracefully exiting the CRMd
Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: ERROR: do_exit: Could not
recover from internal error
Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_exit: [crmd] stopped
(2)
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Emergency Shutdown:
Master Control process died.
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5057 with
SIGTERM
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5062 with
SIGTERM
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5063 with
SIGTERM
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5064 with
SIGTERM
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5065 with
SIGTERM
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5066 with
SIGTERM
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5067 with
SIGTERM
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5068 with
SIGTERM
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5069 with
SIGTERM
Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Emergency
Shutdown(MCP dead): Killing ourselves.
Mar  3 16:00:12 S-FL2-PLS-NAC auditd[2341]: Audit daemon rotating log files
Mar  3 19:06:54 S-FL2-PLS-NAC auditd[2341]: Audit daemon rotating log files

Logs from the second node are:

Mar 14 19:38:13 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: Node
s-fl2-sls-nac.yardi.com: interval 23220 ms
Mar 14 19:38:36 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: Node
s-fl2-sls-nac.yardi.com: interval 23160 ms
Mar 14 19:38:59 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: Node
s-fl2-sls-nac.yardi.com: interval 23220 ms
Mar 14 19:39:22 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: Node
s-fl2-sls-nac.yardi.com: interval 23180 ms
Mar 14 19:39:45 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: Node
s-fl2-sls-nac.yardi.com: interval 23200 ms
Mar 14 19:40:08 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: Node
s-fl2-sls-nac.yardi.com: interval 23150 ms
Mar 14 19:40:32 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: Node
s-fl2-sls-nac.yardi.com: interval 23200 ms
<lots of these messages>

Mar 14 19:41:18 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: Node
s-fl2-sls-nac.yardi.com: interval 23250 ms
Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: Node
s-fl2-sls-nac.yardi.com: interval 23580 ms
Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: info: Heartbeat restart on
node s-fl2-pls-nac.yardi.com
Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: info: Link
s-fl2-pls-nac.yardi.com:eth3 up.
Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: info: Status update for
node s-fl2-pls-nac.yardi.com: status init
Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: info: Link
s-fl2-pls-nac.yardi.com:eth1 up.
Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: info: Status update for
node s-fl2-pls-nac.yardi.com: status up
Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: ERROR: Message hist queue
is filling up (200 messages in queue)
Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: ERROR: Message hist queue
is filling up (200 messages in queue)
Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: info: all clients are now
paused
Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: ERROR: Message hist queue
is filling up (200 messages in queue)
Mar 14 19:41:42 S-FL2-SLS-NAC last message repeated 2 times
Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: info: Status update for
node s-fl2-pls-nac.yardi.com: status active
Mar 14 19:41:42 S-FL2-SLS-NAC cib: [2492]: info: cib_client_status_callback:
Status update: Client s-fl2-pls-nac.yardi.com/cib now has status [join]
Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: WARN: 1 lost packet(s) for
[s-fl2-pls-nac.yardi.com] [42:44]
Mar 14 19:41:42 S-FL2-SLS-NAC crmd: [2496]: notice: crmd_ha_status_callback:
Status update: Node s-fl2-pls-nac.yardi.com now has status [init]
Mar 14 19:41:42 S-FL2-SLS-NAC crmd: [2496]: info: crmd_ha_status_callback:
Ping node s-fl2-pls-nac.yardi.com is init
Mar 14 19:41:42 S-FL2-SLS-NAC crmd: [2496]: notice: crmd_ha_status_callback:
Status update: Node s-fl2-pls-nac.yardi.com now has status [up]
Mar 14 19:41:42 S-FL2-SLS-NAC crmd: [2496]: info: crmd_ha_status_callback:
Ping node s-fl2-pls-nac.yardi.com is up
Mar 14 19:41:42 S-FL2-SLS-NAC crmd: [2496]: notice: crmd_ha_status_callback:
Status update: Node s-fl2-pls-nac.yardi.com now has status [active]
Mar 14 19:41:42 S-FL2-SLS-NAC cib: [2492]: info: cib_diff_notify: Local-only
Change (client:2496, call: 175): 0.26.612 (ok)
Mar 14 19:41:42 S-FL2-SLS-NAC tengine: [10419]: info: te_update_diff:
Processing diff (cib_update): 0.26.612 -> 0.26.612
Mar 14 19:41:42 S-FL2-SLS-NAC cib: [2991]: info: write_cib_contents: Wrote
version 0.26.612 of the CIB to disk (digest:
e9e9c5aebf16b1faf617dca58907fc8c)
Mar 14 19:41:43 S-FL2-SLS-NAC heartbeat: [2411]: ERROR: Message hist queue
is filling up (200 messages in queue)
Mar 14 19:41:43 S-FL2-SLS-NAC heartbeat: [2411]: info: No pkts missing from
s-fl2-pls-nac.yardi.com!
Mar 14 19:41:43 S-FL2-SLS-NAC heartbeat: [2411]: ERROR: Message hist queue
is filling up (200 messages in queue)
Mar 14 19:41:43 S-FL2-SLS-NAC heartbeat: [2411]: WARN: 1 lost packet(s) for
[s-fl2-pls-nac.yardi.com] [47:49]
Mar 14 19:41:43 S-FL2-SLS-NAC crmd: [2496]: notice:
crmd_client_status_callback: Status update: Client
s-fl2-pls-nac.yardi.com/crmd now has status [online]
Mar 14 19:41:43 S-FL2-SLS-NAC crmd: [2496]: info:
crmd_client_status_callback: Uncaching UUID for s-fl2-pls-nac.yardi.com
Mar 14 19:41:44 S-FL2-SLS-NAC heartbeat: [2411]: ERROR: Message hist queue
is filling up (200 messages in queue)
Mar 14 19:41:44 S-FL2-SLS-NAC heartbeat: [2411]: info: No pkts missing from
s-fl2-pls-nac.yardi.com!
Mar 14 19:41:44 S-FL2-SLS-NAC heartbeat: [2411]: ERROR: Message hist queue
is filling up (200 messages in queue)
Mar 14 19:41:44 S-FL2-SLS-NAC heartbeat: [2411]: info: all clients are now
resumed
Mar 14 19:41:44 S-FL2-SLS-NAC cib: [2492]: info: cib_process_readwrite: We
are now in R/O mode
Mar 14 19:41:44 S-FL2-SLS-NAC cib: [2492]: WARN: cib_process_diff: Diff
0.26.600 -> 0.26.601 not applied to 0.26.612: current "num_updates" is
greater than required
Mar 14 19:41:44 S-FL2-SLS-NAC cib: [2492]: WARN: do_cib_notify:
cib_apply_diff of <diff > FAILED: Application of an update diff failed
Mar 14 19:41:44 S-FL2-SLS-NAC cib: [2492]: WARN: cib_process_request:
cib_apply_diff operation failed: Application of an update diff failed
Mar 14 19:41:44 S-FL2-SLS-NAC cib: [2492]: WARN: cib_process_replace:
Replacement 0.26.601 not applied to 0.26.612: current num_updates is greater
than the replacement

<lots of these messages>

Mar 14 19:41:47 S-FL2-SLS-NAC ccm: [2491]: ERROR: Lost connection to
heartbeat service. Need to bail out.
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: get_uuid:
get_uuid_by_name() call failed for host s-fl2-pls-nac.yardi.com
Mar 14 19:41:47 S-FL2-SLS-NAC cib: [2492]: ERROR: cib_ha_connection_destroy:
Heartbeat connection lost!  Exiting.
Mar 14 19:41:47 S-FL2-SLS-NAC cib: [2492]: info: uninitializeCib: The CIB
has been deallocated.
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_state_transition:
s-fl2-sls-nac.yardi.com: State transition S_IDLE -> S_INTEGRATION [
input=I_NODE_JOIN cause=C_HA_MESSAGE origin=route_message ]
Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: info: update_abort_priority:
Abort priority upgraded to 1000000
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: update_dc: Set DC to
<null> (<null>)
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: send_ha_message: Not
connected to Heartbeat
Mar 14 19:41:47 S-FL2-SLS-NAC stonithd: [2494]: ERROR: Disconnected with
heartbeat daemon
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: crm_log_message_adv:
#========= HA[outbound] message start ==========#
Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: ERROR: cib_native_msgready:
Message pending on command channel [2492]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG: Dumping message with
10 fields
Mar 14 19:41:47 S-FL2-SLS-NAC mgmtd: [2497]: ERROR: cib_native_msgready:
Message pending on command channel [2492]
Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: ERROR: crm_log_message_adv:
#========= cib:cmd message start ==========#
Mar 14 19:41:47 S-FL2-SLS-NAC attrd: [2495]: CRIT:
attrd_ha_connection_destroy: Lost connection to heartbeat service!
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[0] :
[origin=join_make_offer]
Mar 14 19:41:47 S-FL2-SLS-NAC mgmtd: [2497]: ERROR: crm_log_message_adv:
#========= cib:cmd message start ==========#
Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: ERROR: MSG: No message to
dump
Mar 14 19:41:47 S-FL2-SLS-NAC attrd: [2495]: ERROR: cib_native_msgready:
Message pending on command channel [2492]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[1] : [t=crmd]
Mar 14 19:41:47 S-FL2-SLS-NAC mgmtd: [2497]: ERROR: MSG: No message to dump
Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: info: cib_native_msgready:
Lost connection to the CIB service [2492].
Mar 14 19:41:47 S-FL2-SLS-NAC attrd: [2495]: ERROR: crm_log_message_adv:
#========= cib:cmd message start ==========#
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[2] : [version=1.0.7]
Mar 14 19:41:47 S-FL2-SLS-NAC attrd: [2495]: ERROR: MSG: No message to dump
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[3] : [subt=request]
Mar 14 19:41:47 S-FL2-SLS-NAC attrd: [2495]: info: cib_native_msgready: Lost
connection to the CIB service [2492].
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[4] :
[reference=join_offer-dc-1205503907-113]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[5] :
[crm_task=join_offer]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[6] : [crm_sys_to=crmd]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[7] : [crm_sys_from=dc]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[8] : [crm_host_to=
s-fl2-pls-nac.yardi.com]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[9] : [join_id=8]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: send_msg_via_ha: Sending
directed HA message (ref=join_offer-dc-1205503907-113) to
[EMAIL PROTECTED] failed.
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: send_ha_message: Not
connected to Heartbeat
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: crm_log_message_adv:
#========= HA[outbound] message start ==========#
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG: Dumping message with
10 fields
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[0] :
[origin=join_make_offer]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[1] : [t=crmd]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[2] : [version=1.0.7]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[3] : [subt=request]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[4] :
[reference=join_offer-dc-1205503907-114]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[5] :
[crm_task=join_offer]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[6] : [crm_sys_to=crmd]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[7] : [crm_sys_from=dc]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[8] : [crm_host_to=
s-fl2-sls-nac.yardi.com]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[9] : [join_id=8]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: send_msg_via_ha: Sending
directed HA message (ref=join_offer-dc-1205503907-114) to
[EMAIL PROTECTED] failed.
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_dc_join_offer_all:
join-8: Waiting on 2 outstanding join acks
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_election_count_vote:
Election check: vote from s-fl2-pls-nac.yardi.com
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_election_count_vote:
Election won over s-fl2-pls-nac.yardi.com
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_state_transition:
s-fl2-sls-nac.yardi.com: State transition S_INTEGRATION -> S_ELECTION [
input=I_ELECTION cause=C_FSA_INTERNAL origin=do_election_count_vote ]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: update_dc: Set DC to
<null> (<null>)
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: send_ha_message: Not
connected to Heartbeat
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: crm_log_message_adv:
#========= HA[outbound] message start ==========#
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG: Dumping message with
10 fields
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[0] :
[origin=do_election_vote]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[1] : [t=crmd]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[2] : [version=1.0.7]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[3] : [subt=request]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[4] :
[reference=vote-crmd-1205503907-115]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[5] : [crm_task=vote]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[6] : [crm_sys_to=crmd]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[7] :
[crm_sys_from=crmd]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[8] :
[election-owner=a5ea4881-0e06-4ea3-83a9-1d0f2184109d]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[9] : [election-id=4]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: send_msg_via_ha: Sending
broadcast HA message (ref=vote-crmd-1205503907-115) to crmd@<all> failed.
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: populate_cib_nodes:
Requesting the list of configured nodes
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: get_uuid:
get_uuid_by_name() call failed for host s-fl2-pls-nac.yardi.com
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: crm_abort: add_node_copy:
Triggered non-fatal assert at xml.c:281 : src_node != NULL
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR:
crmd_cib_connection_destroy: Connection to the CIB terminated...
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: send_ha_message: Not
connected to Heartbeat
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: crm_log_message_adv:
#========= HA[outbound] message start ==========#
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG: Dumping message with
10 fields
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[0] :
[origin=do_election_vote]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[1] : [t=crmd]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[2] : [version=1.0.7]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[3] : [subt=request]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[4] :
[reference=vote-crmd-1205503907-116]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[5] : [crm_task=vote]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[6] : [crm_sys_to=crmd]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[7] :
[crm_sys_from=crmd]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[8] :
[election-owner=a5ea4881-0e06-4ea3-83a9-1d0f2184109d]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[9] : [election-id=5]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: send_msg_via_ha: Sending
broadcast HA message (ref=vote-crmd-1205503907-116) to crmd@<all> failed.
Mar 14 19:41:47 S-FL2-SLS-NAC mgmtd: [2497]: ERROR: Lost connection to
heartbeat service.
Mar 14 19:41:47 S-FL2-SLS-NAC stonithd: [2494]: notice:
/usr/lib/heartbeat/stonithd normally quit.
Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: ERROR:
stonithd_op_result_ready: failed due to not on signon status.
Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: ERROR:
tengine_stonith_connection_destroy: Fencing daemon has left us
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: do_log: [[FSA]] Input
I_ERROR from crmd_cib_connection_destroy() received in state (S_ELECTION)
Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: info: update_abort_priority:
Abort action 2 superceeded by 3
Mar 14 19:41:47 S-FL2-SLS-NAC pengine: [10420]: info: pengine_shutdown:
Exiting PEngine (SIGTERM)
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_state_transition:
s-fl2-sls-nac.yardi.com: State transition S_ELECTION -> S_RECOVERY [
input=I_ERROR cause=C_FSA_INTERNAL origin=crmd_cib_connection_destroy ]
Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: info: notify_crmd: Exiting
after transition
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: do_recover: Action
A_RECOVER (0000000001000000) not supported
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_dc_release: DC role
released
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent -TERM
to pengine: [10420]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent -TERM
to tengine: [10419]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: do_log: [[FSA]] Input
I_STOP from do_recover() received in state (S_RECOVERY)
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_state_transition:
s-fl2-sls-nac.yardi.com: State transition S_RECOVERY -> S_STOPPING [
input=I_STOP cause=C_FSA_INTERNAL origin=do_recover ]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_dc_release: DC role
released
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent -TERM
to pengine: [10420]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent -TERM
to tengine: [10419]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: Terminating
the pengine
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent -TERM
to pengine: [10420]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: Terminating
the tengine
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent -TERM
to tengine: [10419]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: Waiting for
subsystems to exit
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: register_fsa_input_adv:
do_shutdown stalled the FSA with pending inputs
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: do_log: [[FSA]] Input
I_RELEASE_SUCCESS from do_dc_release() received in state (S_STOPPING)
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: Terminating
the pengine
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent -TERM
to pengine: [10420]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: Terminating
the tengine
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent -TERM
to tengine: [10419]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: Waiting for
subsystems to exit
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: register_fsa_input_adv:
do_shutdown stalled the FSA with pending inputs
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: mem_handle_func:IPC
broken, ccm is dead before the client!
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: do_log: [[FSA]] Input
I_RELEASE_SUCCESS from do_dc_release() received in state (S_STOPPING)
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: Terminating
the pengine
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent -TERM
to pengine: [10420]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: Terminating
the tengine
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent -TERM
to tengine: [10419]
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: Waiting for
subsystems to exit
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: verify_stopped: Checking
for active resources before exit
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: 10
pending LRM operations at shutdown
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending:
Pending action: Event-Gateway:25
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending:
Pending action: Policy-Manager:41
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending:
Pending action: Event-Correlation:39
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending:
Pending action: Check-Drives:13
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending:
Pending action: IPaddr_corp:19
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending:
Pending action: Master-Database:21
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending:
Pending action: IPaddr_mgmt:15
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending:
Pending action: IPaddr_log:17
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending:
Pending action: Events-Database:23
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending:
Pending action: Admin-Notify:43
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: Resource
Events-Database was active at shutdown.  You may ignore this error if it is
unmanaged.
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: Resource
Event-Gateway was active at shutdown.  You may ignore this error if it is
unmanaged.
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: Resource
Policy-Manager was active at shutdown.  You may ignore this error if it is
unmanaged.
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: Resource
IPaddr_corp was active at shutdown.  You may ignore this error if it is
unmanaged.
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: Resource
IPaddr_mgmt was active at shutdown.  You may ignore this error if it is
unmanaged.
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: Resource
IPaddr_log was active at shutdown.  You may ignore this error if it is
unmanaged.
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: Resource
Master-Database was active at shutdown.  You may ignore this error if it is
unmanaged.
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: Resource
Event-Correlation was active at shutdown.  You may ignore this error if it
is unmanaged.
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: Resource
Check-Drives was active at shutdown.  You may ignore this error if it is
unmanaged.
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: Resource
Admin-Notify was active at shutdown.  You may ignore this error if it is
unmanaged.
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: do_exit: Performing
A_EXIT_1 - forcefully exiting the CRMd
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: do_exit: Could not
recover from internal error
Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_exit: [crmd] stopped
(2)
Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Emergency Shutdown:
Master Control process died.
Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2411 with
SIGTERM
Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2443 with
SIGTERM
Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2444 with
SIGTERM
Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2445 with
SIGTERM
Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2446 with
SIGTERM
Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2447 with
SIGTERM
Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2448 with
SIGTERM
Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2449 with
SIGTERM
Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2450 with
SIGTERM
Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Emergency
Shutdown(MCP dead): Killing ourselves.
Mar 14 23:02:12 S-FL2-SLS-NAC auditd[2292]: Audit daemon rotating log files
Mar 15 07:22:19 S-FL2-SLS-NAC auditd[2292]: Audit daemon rotating log files
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to