Hi Manas,

  can you check if your  local firewall is blocking access to port 649 (or 
another HA-used port) ? Maybe some networking problems ?

HTH

Nikita Michalko


Am Montag, 17. März 2008 14:15 schrieb Manas Garg:
> Hi,
>
> We have a two nodes setup running heartbeat version 2.0.8-1. On one node,
> heartbeat exited saying Emergency Shutdown. It was restarted. After the
> restart, the heartbeat on the other node exited giving roughly the same
> reason. Can someone please help us identify the issue. If these are known
> bugs and if those bugs have been fixed in later releases?
>
> Any help would be greatly appreciated.
>
> The nodes configuration:
>
> sh-3.00# uname -a
> Linux S-FL2-PLS-NAC 2.6.17-1.2142_FC4smp #1 SMP Sat Aug 12 08:16:08 EDT
> 2006 i686 i686 i386 GNU/Linux
>
> Following are the logs from the first node:
>
> Mar  3 14:47:05 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Message hist queue
> is filling up (197 messages in queue)
> Mar  3 14:47:05 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Message hist queue
> is filling up (198 messages in queue)
> Mar  3 14:47:06 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Message hist queue
> is filling up (199 messages in queue)
> Mar  3 14:47:06 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Message hist queue
> is filling up (200 messages in queue)
> Mar  3 14:47:10 S-FL2-PLS-NAC last message repeated 7 times
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit pkt 7
> for s-fl2-sls-nac.yardi.com: seqno too low
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode =
> s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information:
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =207,
> lowseq=7,ackseq=0,lastmsg=6
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit pkt 7
> for s-fl2-sls-nac.yardi.com: seqno too low
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode =
> s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information:
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =207,
> lowseq=7,ackseq=0,lastmsg=6
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Message hist queue
> is filling up (200 messages in queue)
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit pkt 8
> for s-fl2-sls-nac.yardi.com: seqno too low
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode =
> s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information:
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =208,
> lowseq=8,ackseq=0,lastmsg=7
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit pkt 8
> for s-fl2-sls-nac.yardi.com: seqno too low
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode =
> s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information:
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =208,
> lowseq=8,ackseq=0,lastmsg=7
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit pkt 8
> for s-fl2-sls-nac.yardi.com: seqno too low
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode =
> s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information:
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =208,
> lowseq=8,ackseq=0,lastmsg=7
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit pkt 8
> for s-fl2-sls-nac.yardi.com: seqno too low
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode =
> s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information:
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =208,
> lowseq=8,ackseq=0,lastmsg=7
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: lowseq cannnot be
> greater than ackseq
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist->ackseq =10,
> old_ackseq=0
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist->lowseq =201,
> hist->hiseq=208, send_cluster_msg_level=0
> Mar  3 14:47:10 S-FL2-PLS-NAC ccm: [5284]: ERROR: Lost connection to
> heartbeat service. Need to bail out.
> Mar  3 14:47:10 S-FL2-PLS-NAC cib: [5285]: ERROR:
> cib_ha_connection_destroy: Heartbeat connection lost!  Exiting.
> Mar  3 14:47:10 S-FL2-PLS-NAC stonithd: [5287]: ERROR: Disconnected with
> heartbeat daemon
> Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: CRIT: crmd_ha_msg_dispatch:
> Lost connection to heartbeat service.
> Mar  3 14:47:10 S-FL2-PLS-NAC mgmtd: [5290]: ERROR: Lost connection to
> heartbeat service.
> Mar  3 14:47:10 S-FL2-PLS-NAC stonithd: [5287]: notice:
> /usr/lib/heartbeat/stonithd normally quit.
> Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: mem_handle_func:IPC
> broken, ccm is dead before the client!
> Mar  3 14:47:10 S-FL2-PLS-NAC attrd: [5288]: CRIT: attrd_ha_dispatch: Lost
> connection to heartbeat service.
> Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: ERROR: ccm_dispatch: CCM
> connection appears to have failed: rc=-1.
> Mar  3 14:47:10 S-FL2-PLS-NAC attrd: [5288]: CRIT:
> attrd_ha_connection_destroy: Lost connection to heartbeat service!
> Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: ERROR: do_log: [[FSA]] Input
> I_ERROR from ccm_dispatch() received in state (S_PENDING)
> Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_state_transition:
> s-fl2-pls-nac.yardi.com: State transition S_PENDING -> S_RECOVERY [
> input=I_ERROR cause=C_CCM_CALLBACK origin=ccm_dispatch ]
> Mar  3 14:47:10 S-FL2-PLS-NAC cib: [5285]: info: uninitializeCib: The CIB
> has been deallocated.
> Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: ERROR: do_recover: Action
> A_RECOVER (0000000001000000) not supported
> Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: ERROR: do_log: [[FSA]] Input
> I_STOP from do_recover() received in state (S_RECOVERY)
> Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_state_transition:
> s-fl2-pls-nac.yardi.com: State transition S_RECOVERY -> S_STOPPING [
> input=I_STOP cause=C_FSA_INTERNAL origin=do_recover ]
> Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_dc_release: DC role
> released
> Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: WARN: do_log: [[FSA]] Input
> I_RELEASE_SUCCESS from do_dc_release() received in state (S_STOPPING)
> Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_state_transition:
> s-fl2-pls-nac.yardi.com: State transition S_STOPPING -> S_TERMINATE [
> input=I_TERMINATE cause=C_FSA_INTERNAL origin=do_shutdown ]
> Mar  3 14:47:10 S-FL2-PLS-NAC attrd: [5288]: ERROR: cib_native_msgready:
> Message pending on command channel [5285]
> Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: verify_stopped: Checking
> for active resources before exit
> Mar  3 14:47:10 S-FL2-PLS-NAC attrd: [5288]: ERROR: crm_log_message_adv:
> #========= cib:cmd message start ==========#
> Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: verify_stopped: Checking
> for active resources before exit
> Mar  3 14:47:10 S-FL2-PLS-NAC attrd: [5288]: ERROR: MSG: No message to dump
> Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_lrm_control:
> Disconnected from the LRM
> Mar  3 14:47:10 S-FL2-PLS-NAC mgmtd: [5290]: ERROR: cib_native_msgready:
> Message pending on command channel [5285]
> Mar  3 14:47:10 S-FL2-PLS-NAC attrd: [5288]: info: cib_native_msgready:
> Lost connection to the CIB service [5285].
> Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_ha_control:
> Disconnected from Heartbeat
> Mar  3 14:47:10 S-FL2-PLS-NAC mgmtd: [5290]: ERROR: crm_log_message_adv:
> #========= cib:cmd message start ==========#
> Mar  3 14:47:10 S-FL2-PLS-NAC attrd: [5288]: CRIT: cib_native_dispatch:
> Lost connection to the CIB service [5285/callback].
> Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_cib_control:
> Disconnecting CIB
> Mar  3 14:47:10 S-FL2-PLS-NAC mgmtd: [5290]: ERROR: MSG: No message to dump
> Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info:
> crmd_cib_connection_destroy: Connection to the CIB terminated...
> Mar  3 14:47:10 S-FL2-PLS-NAC mgmtd: [5290]: CRIT: cib_native_dispatch:
> Lost connection to the CIB service [5285/callback].
> Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_exit: Performing
> A_EXIT_0 - gracefully exiting the CRMd
> Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: ERROR: do_exit: Could not
> recover from internal error
> Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_exit: [crmd] stopped
> (2)
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Emergency Shutdown:
> Master Control process died.
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5057
> with SIGTERM
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5062
> with SIGTERM
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5063
> with SIGTERM
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5064
> with SIGTERM
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5065
> with SIGTERM
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5066
> with SIGTERM
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5067
> with SIGTERM
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5068
> with SIGTERM
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5069
> with SIGTERM
> Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Emergency
> Shutdown(MCP dead): Killing ourselves.
> Mar  3 16:00:12 S-FL2-PLS-NAC auditd[2341]: Audit daemon rotating log files
> Mar  3 19:06:54 S-FL2-PLS-NAC auditd[2341]: Audit daemon rotating log files
>
> Logs from the second node are:
>
> Mar 14 19:38:13 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: Node
> s-fl2-sls-nac.yardi.com: interval 23220 ms
> Mar 14 19:38:36 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: Node
> s-fl2-sls-nac.yardi.com: interval 23160 ms
> Mar 14 19:38:59 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: Node
> s-fl2-sls-nac.yardi.com: interval 23220 ms
> Mar 14 19:39:22 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: Node
> s-fl2-sls-nac.yardi.com: interval 23180 ms
> Mar 14 19:39:45 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: Node
> s-fl2-sls-nac.yardi.com: interval 23200 ms
> Mar 14 19:40:08 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: Node
> s-fl2-sls-nac.yardi.com: interval 23150 ms
> Mar 14 19:40:32 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: Node
> s-fl2-sls-nac.yardi.com: interval 23200 ms
> <lots of these messages>
>
> Mar 14 19:41:18 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: Node
> s-fl2-sls-nac.yardi.com: interval 23250 ms
> Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: Node
> s-fl2-sls-nac.yardi.com: interval 23580 ms
> Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: info: Heartbeat restart on
> node s-fl2-pls-nac.yardi.com
> Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: info: Link
> s-fl2-pls-nac.yardi.com:eth3 up.
> Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: info: Status update for
> node s-fl2-pls-nac.yardi.com: status init
> Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: info: Link
> s-fl2-pls-nac.yardi.com:eth1 up.
> Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: info: Status update for
> node s-fl2-pls-nac.yardi.com: status up
> Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: ERROR: Message hist queue
> is filling up (200 messages in queue)
> Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: ERROR: Message hist queue
> is filling up (200 messages in queue)
> Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: info: all clients are now
> paused
> Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: ERROR: Message hist queue
> is filling up (200 messages in queue)
> Mar 14 19:41:42 S-FL2-SLS-NAC last message repeated 2 times
> Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: info: Status update for
> node s-fl2-pls-nac.yardi.com: status active
> Mar 14 19:41:42 S-FL2-SLS-NAC cib: [2492]: info:
> cib_client_status_callback: Status update: Client
> s-fl2-pls-nac.yardi.com/cib now has status [join] Mar 14 19:41:42
> S-FL2-SLS-NAC heartbeat: [2411]: WARN: 1 lost packet(s) for
> [s-fl2-pls-nac.yardi.com] [42:44]
> Mar 14 19:41:42 S-FL2-SLS-NAC crmd: [2496]: notice:
> crmd_ha_status_callback: Status update: Node s-fl2-pls-nac.yardi.com now
> has status [init]
> Mar 14 19:41:42 S-FL2-SLS-NAC crmd: [2496]: info: crmd_ha_status_callback:
> Ping node s-fl2-pls-nac.yardi.com is init
> Mar 14 19:41:42 S-FL2-SLS-NAC crmd: [2496]: notice:
> crmd_ha_status_callback: Status update: Node s-fl2-pls-nac.yardi.com now
> has status [up]
> Mar 14 19:41:42 S-FL2-SLS-NAC crmd: [2496]: info: crmd_ha_status_callback:
> Ping node s-fl2-pls-nac.yardi.com is up
> Mar 14 19:41:42 S-FL2-SLS-NAC crmd: [2496]: notice:
> crmd_ha_status_callback: Status update: Node s-fl2-pls-nac.yardi.com now
> has status [active] Mar 14 19:41:42 S-FL2-SLS-NAC cib: [2492]: info:
> cib_diff_notify: Local-only Change (client:2496, call: 175): 0.26.612 (ok)
> Mar 14 19:41:42 S-FL2-SLS-NAC tengine: [10419]: info: te_update_diff:
> Processing diff (cib_update): 0.26.612 -> 0.26.612
> Mar 14 19:41:42 S-FL2-SLS-NAC cib: [2991]: info: write_cib_contents: Wrote
> version 0.26.612 of the CIB to disk (digest:
> e9e9c5aebf16b1faf617dca58907fc8c)
> Mar 14 19:41:43 S-FL2-SLS-NAC heartbeat: [2411]: ERROR: Message hist queue
> is filling up (200 messages in queue)
> Mar 14 19:41:43 S-FL2-SLS-NAC heartbeat: [2411]: info: No pkts missing from
> s-fl2-pls-nac.yardi.com!
> Mar 14 19:41:43 S-FL2-SLS-NAC heartbeat: [2411]: ERROR: Message hist queue
> is filling up (200 messages in queue)
> Mar 14 19:41:43 S-FL2-SLS-NAC heartbeat: [2411]: WARN: 1 lost packet(s) for
> [s-fl2-pls-nac.yardi.com] [47:49]
> Mar 14 19:41:43 S-FL2-SLS-NAC crmd: [2496]: notice:
> crmd_client_status_callback: Status update: Client
> s-fl2-pls-nac.yardi.com/crmd now has status [online]
> Mar 14 19:41:43 S-FL2-SLS-NAC crmd: [2496]: info:
> crmd_client_status_callback: Uncaching UUID for s-fl2-pls-nac.yardi.com
> Mar 14 19:41:44 S-FL2-SLS-NAC heartbeat: [2411]: ERROR: Message hist queue
> is filling up (200 messages in queue)
> Mar 14 19:41:44 S-FL2-SLS-NAC heartbeat: [2411]: info: No pkts missing from
> s-fl2-pls-nac.yardi.com!
> Mar 14 19:41:44 S-FL2-SLS-NAC heartbeat: [2411]: ERROR: Message hist queue
> is filling up (200 messages in queue)
> Mar 14 19:41:44 S-FL2-SLS-NAC heartbeat: [2411]: info: all clients are now
> resumed
> Mar 14 19:41:44 S-FL2-SLS-NAC cib: [2492]: info: cib_process_readwrite: We
> are now in R/O mode
> Mar 14 19:41:44 S-FL2-SLS-NAC cib: [2492]: WARN: cib_process_diff: Diff
> 0.26.600 -> 0.26.601 not applied to 0.26.612: current "num_updates" is
> greater than required
> Mar 14 19:41:44 S-FL2-SLS-NAC cib: [2492]: WARN: do_cib_notify:
> cib_apply_diff of <diff > FAILED: Application of an update diff failed
> Mar 14 19:41:44 S-FL2-SLS-NAC cib: [2492]: WARN: cib_process_request:
> cib_apply_diff operation failed: Application of an update diff failed
> Mar 14 19:41:44 S-FL2-SLS-NAC cib: [2492]: WARN: cib_process_replace:
> Replacement 0.26.601 not applied to 0.26.612: current num_updates is
> greater than the replacement
>
> <lots of these messages>
>
> Mar 14 19:41:47 S-FL2-SLS-NAC ccm: [2491]: ERROR: Lost connection to
> heartbeat service. Need to bail out.
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: get_uuid:
> get_uuid_by_name() call failed for host s-fl2-pls-nac.yardi.com
> Mar 14 19:41:47 S-FL2-SLS-NAC cib: [2492]: ERROR:
> cib_ha_connection_destroy: Heartbeat connection lost!  Exiting.
> Mar 14 19:41:47 S-FL2-SLS-NAC cib: [2492]: info: uninitializeCib: The CIB
> has been deallocated.
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_state_transition:
> s-fl2-sls-nac.yardi.com: State transition S_IDLE -> S_INTEGRATION [
> input=I_NODE_JOIN cause=C_HA_MESSAGE origin=route_message ]
> Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: info:
> update_abort_priority: Abort priority upgraded to 1000000
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: update_dc: Set DC to
> <null> (<null>)
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: send_ha_message: Not
> connected to Heartbeat
> Mar 14 19:41:47 S-FL2-SLS-NAC stonithd: [2494]: ERROR: Disconnected with
> heartbeat daemon
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: crm_log_message_adv:
> #========= HA[outbound] message start ==========#
> Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: ERROR: cib_native_msgready:
> Message pending on command channel [2492]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG: Dumping message with
> 10 fields
> Mar 14 19:41:47 S-FL2-SLS-NAC mgmtd: [2497]: ERROR: cib_native_msgready:
> Message pending on command channel [2492]
> Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: ERROR: crm_log_message_adv:
> #========= cib:cmd message start ==========#
> Mar 14 19:41:47 S-FL2-SLS-NAC attrd: [2495]: CRIT:
> attrd_ha_connection_destroy: Lost connection to heartbeat service!
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[0] :
> [origin=join_make_offer]
> Mar 14 19:41:47 S-FL2-SLS-NAC mgmtd: [2497]: ERROR: crm_log_message_adv:
> #========= cib:cmd message start ==========#
> Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: ERROR: MSG: No message to
> dump
> Mar 14 19:41:47 S-FL2-SLS-NAC attrd: [2495]: ERROR: cib_native_msgready:
> Message pending on command channel [2492]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[1] : [t=crmd]
> Mar 14 19:41:47 S-FL2-SLS-NAC mgmtd: [2497]: ERROR: MSG: No message to dump
> Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: info: cib_native_msgready:
> Lost connection to the CIB service [2492].
> Mar 14 19:41:47 S-FL2-SLS-NAC attrd: [2495]: ERROR: crm_log_message_adv:
> #========= cib:cmd message start ==========#
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[2] : [version=1.0.7]
> Mar 14 19:41:47 S-FL2-SLS-NAC attrd: [2495]: ERROR: MSG: No message to dump
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[3] : [subt=request]
> Mar 14 19:41:47 S-FL2-SLS-NAC attrd: [2495]: info: cib_native_msgready:
> Lost connection to the CIB service [2492].
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[4] :
> [reference=join_offer-dc-1205503907-113]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[5] :
> [crm_task=join_offer]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[6] :
> [crm_sys_to=crmd] Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[7]
> : [crm_sys_from=dc] Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN:
> MSG[8] : [crm_host_to= s-fl2-pls-nac.yardi.com]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[9] : [join_id=8]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: send_msg_via_ha: Sending
> directed HA message (ref=join_offer-dc-1205503907-113) to
> [EMAIL PROTECTED] failed.
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: send_ha_message: Not
> connected to Heartbeat
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: crm_log_message_adv:
> #========= HA[outbound] message start ==========#
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG: Dumping message with
> 10 fields
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[0] :
> [origin=join_make_offer]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[1] : [t=crmd]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[2] : [version=1.0.7]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[3] : [subt=request]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[4] :
> [reference=join_offer-dc-1205503907-114]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[5] :
> [crm_task=join_offer]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[6] :
> [crm_sys_to=crmd] Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[7]
> : [crm_sys_from=dc] Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN:
> MSG[8] : [crm_host_to= s-fl2-sls-nac.yardi.com]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[9] : [join_id=8]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: send_msg_via_ha: Sending
> directed HA message (ref=join_offer-dc-1205503907-114) to
> [EMAIL PROTECTED] failed.
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_dc_join_offer_all:
> join-8: Waiting on 2 outstanding join acks
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_election_count_vote:
> Election check: vote from s-fl2-pls-nac.yardi.com
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_election_count_vote:
> Election won over s-fl2-pls-nac.yardi.com
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_state_transition:
> s-fl2-sls-nac.yardi.com: State transition S_INTEGRATION -> S_ELECTION [
> input=I_ELECTION cause=C_FSA_INTERNAL origin=do_election_count_vote ]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: update_dc: Set DC to
> <null> (<null>)
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: send_ha_message: Not
> connected to Heartbeat
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: crm_log_message_adv:
> #========= HA[outbound] message start ==========#
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG: Dumping message with
> 10 fields
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[0] :
> [origin=do_election_vote]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[1] : [t=crmd]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[2] : [version=1.0.7]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[3] : [subt=request]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[4] :
> [reference=vote-crmd-1205503907-115]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[5] : [crm_task=vote]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[6] :
> [crm_sys_to=crmd] Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[7]
> :
> [crm_sys_from=crmd]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[8] :
> [election-owner=a5ea4881-0e06-4ea3-83a9-1d0f2184109d]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[9] : [election-id=4]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: send_msg_via_ha: Sending
> broadcast HA message (ref=vote-crmd-1205503907-115) to crmd@<all> failed.
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: populate_cib_nodes:
> Requesting the list of configured nodes
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: get_uuid:
> get_uuid_by_name() call failed for host s-fl2-pls-nac.yardi.com
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: crm_abort:
> add_node_copy: Triggered non-fatal assert at xml.c:281 : src_node != NULL
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR:
> crmd_cib_connection_destroy: Connection to the CIB terminated...
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: send_ha_message: Not
> connected to Heartbeat
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: crm_log_message_adv:
> #========= HA[outbound] message start ==========#
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG: Dumping message with
> 10 fields
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[0] :
> [origin=do_election_vote]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[1] : [t=crmd]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[2] : [version=1.0.7]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[3] : [subt=request]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[4] :
> [reference=vote-crmd-1205503907-116]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[5] : [crm_task=vote]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[6] :
> [crm_sys_to=crmd] Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[7]
> :
> [crm_sys_from=crmd]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[8] :
> [election-owner=a5ea4881-0e06-4ea3-83a9-1d0f2184109d]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[9] : [election-id=5]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: send_msg_via_ha: Sending
> broadcast HA message (ref=vote-crmd-1205503907-116) to crmd@<all> failed.
> Mar 14 19:41:47 S-FL2-SLS-NAC mgmtd: [2497]: ERROR: Lost connection to
> heartbeat service.
> Mar 14 19:41:47 S-FL2-SLS-NAC stonithd: [2494]: notice:
> /usr/lib/heartbeat/stonithd normally quit.
> Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: ERROR:
> stonithd_op_result_ready: failed due to not on signon status.
> Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: ERROR:
> tengine_stonith_connection_destroy: Fencing daemon has left us
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: do_log: [[FSA]] Input
> I_ERROR from crmd_cib_connection_destroy() received in state (S_ELECTION)
> Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: info:
> update_abort_priority: Abort action 2 superceeded by 3
> Mar 14 19:41:47 S-FL2-SLS-NAC pengine: [10420]: info: pengine_shutdown:
> Exiting PEngine (SIGTERM)
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_state_transition:
> s-fl2-sls-nac.yardi.com: State transition S_ELECTION -> S_RECOVERY [
> input=I_ERROR cause=C_FSA_INTERNAL origin=crmd_cib_connection_destroy ]
> Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: info: notify_crmd: Exiting
> after transition
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: do_recover: Action
> A_RECOVER (0000000001000000) not supported
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_dc_release: DC role
> released
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent
> -TERM to pengine: [10420]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent
> -TERM to tengine: [10419]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: do_log: [[FSA]] Input
> I_STOP from do_recover() received in state (S_RECOVERY)
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_state_transition:
> s-fl2-sls-nac.yardi.com: State transition S_RECOVERY -> S_STOPPING [
> input=I_STOP cause=C_FSA_INTERNAL origin=do_recover ]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_dc_release: DC role
> released
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent
> -TERM to pengine: [10420]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent
> -TERM to tengine: [10419]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: Terminating
> the pengine
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent
> -TERM to pengine: [10420]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: Terminating
> the tengine
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent
> -TERM to tengine: [10419]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: Waiting for
> subsystems to exit
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: register_fsa_input_adv:
> do_shutdown stalled the FSA with pending inputs
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: do_log: [[FSA]] Input
> I_RELEASE_SUCCESS from do_dc_release() received in state (S_STOPPING)
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: Terminating
> the pengine
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent
> -TERM to pengine: [10420]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: Terminating
> the tengine
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent
> -TERM to tengine: [10419]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: Waiting for
> subsystems to exit
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: register_fsa_input_adv:
> do_shutdown stalled the FSA with pending inputs
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: mem_handle_func:IPC
> broken, ccm is dead before the client!
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: do_log: [[FSA]] Input
> I_RELEASE_SUCCESS from do_dc_release() received in state (S_STOPPING)
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: Terminating
> the pengine
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent
> -TERM to pengine: [10420]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: Terminating
> the tengine
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent
> -TERM to tengine: [10419]
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: Waiting for
> subsystems to exit
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: verify_stopped: Checking
> for active resources before exit
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: 10
> pending LRM operations at shutdown
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending:
> Pending action: Event-Gateway:25
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending:
> Pending action: Policy-Manager:41
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending:
> Pending action: Event-Correlation:39
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending:
> Pending action: Check-Drives:13
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending:
> Pending action: IPaddr_corp:19
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending:
> Pending action: Master-Database:21
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending:
> Pending action: IPaddr_mgmt:15
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending:
> Pending action: IPaddr_log:17
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending:
> Pending action: Events-Database:23
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending:
> Pending action: Admin-Notify:43
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: Resource
> Events-Database was active at shutdown.  You may ignore this error if it is
> unmanaged.
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: Resource
> Event-Gateway was active at shutdown.  You may ignore this error if it is
> unmanaged.
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: Resource
> Policy-Manager was active at shutdown.  You may ignore this error if it is
> unmanaged.
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: Resource
> IPaddr_corp was active at shutdown.  You may ignore this error if it is
> unmanaged.
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: Resource
> IPaddr_mgmt was active at shutdown.  You may ignore this error if it is
> unmanaged.
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: Resource
> IPaddr_log was active at shutdown.  You may ignore this error if it is
> unmanaged.
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: Resource
> Master-Database was active at shutdown.  You may ignore this error if it is
> unmanaged.
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: Resource
> Event-Correlation was active at shutdown.  You may ignore this error if it
> is unmanaged.
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: Resource
> Check-Drives was active at shutdown.  You may ignore this error if it is
> unmanaged.
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: Resource
> Admin-Notify was active at shutdown.  You may ignore this error if it is
> unmanaged.
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: do_exit: Performing
> A_EXIT_1 - forcefully exiting the CRMd
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: do_exit: Could not
> recover from internal error
> Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_exit: [crmd] stopped
> (2)
> Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Emergency Shutdown:
> Master Control process died.
> Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2411
> with SIGTERM
> Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2443
> with SIGTERM
> Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2444
> with SIGTERM
> Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2445
> with SIGTERM
> Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2446
> with SIGTERM
> Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2447
> with SIGTERM
> Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2448
> with SIGTERM
> Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2449
> with SIGTERM
> Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2450
> with SIGTERM
> Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Emergency
> Shutdown(MCP dead): Killing ourselves.
> Mar 14 23:02:12 S-FL2-SLS-NAC auditd[2292]: Audit daemon rotating log files
> Mar 15 07:22:19 S-FL2-SLS-NAC auditd[2292]: Audit daemon rotating log files
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to