Hi Manas, can you check if your local firewall is blocking access to port 649 (or another HA-used port) ? Maybe some networking problems ?
HTH Nikita Michalko Am Montag, 17. März 2008 14:15 schrieb Manas Garg: > Hi, > > We have a two nodes setup running heartbeat version 2.0.8-1. On one node, > heartbeat exited saying Emergency Shutdown. It was restarted. After the > restart, the heartbeat on the other node exited giving roughly the same > reason. Can someone please help us identify the issue. If these are known > bugs and if those bugs have been fixed in later releases? > > Any help would be greatly appreciated. > > The nodes configuration: > > sh-3.00# uname -a > Linux S-FL2-PLS-NAC 2.6.17-1.2142_FC4smp #1 SMP Sat Aug 12 08:16:08 EDT > 2006 i686 i686 i386 GNU/Linux > > Following are the logs from the first node: > > Mar 3 14:47:05 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Message hist queue > is filling up (197 messages in queue) > Mar 3 14:47:05 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Message hist queue > is filling up (198 messages in queue) > Mar 3 14:47:06 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Message hist queue > is filling up (199 messages in queue) > Mar 3 14:47:06 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Message hist queue > is filling up (200 messages in queue) > Mar 3 14:47:10 S-FL2-PLS-NAC last message repeated 7 times > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit pkt 7 > for s-fl2-sls-nac.yardi.com: seqno too low > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode = > s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0 > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information: > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =207, > lowseq=7,ackseq=0,lastmsg=6 > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit pkt 7 > for s-fl2-sls-nac.yardi.com: seqno too low > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode = > s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0 > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information: > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =207, > lowseq=7,ackseq=0,lastmsg=6 > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Message hist queue > is filling up (200 messages in queue) > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit pkt 8 > for s-fl2-sls-nac.yardi.com: seqno too low > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode = > s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0 > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information: > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =208, > lowseq=8,ackseq=0,lastmsg=7 > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit pkt 8 > for s-fl2-sls-nac.yardi.com: seqno too low > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode = > s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0 > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information: > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =208, > lowseq=8,ackseq=0,lastmsg=7 > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit pkt 8 > for s-fl2-sls-nac.yardi.com: seqno too low > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode = > s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0 > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information: > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =208, > lowseq=8,ackseq=0,lastmsg=7 > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit pkt 8 > for s-fl2-sls-nac.yardi.com: seqno too low > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode = > s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0 > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information: > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =208, > lowseq=8,ackseq=0,lastmsg=7 > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: lowseq cannnot be > greater than ackseq > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist->ackseq =10, > old_ackseq=0 > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist->lowseq =201, > hist->hiseq=208, send_cluster_msg_level=0 > Mar 3 14:47:10 S-FL2-PLS-NAC ccm: [5284]: ERROR: Lost connection to > heartbeat service. Need to bail out. > Mar 3 14:47:10 S-FL2-PLS-NAC cib: [5285]: ERROR: > cib_ha_connection_destroy: Heartbeat connection lost! Exiting. > Mar 3 14:47:10 S-FL2-PLS-NAC stonithd: [5287]: ERROR: Disconnected with > heartbeat daemon > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: CRIT: crmd_ha_msg_dispatch: > Lost connection to heartbeat service. > Mar 3 14:47:10 S-FL2-PLS-NAC mgmtd: [5290]: ERROR: Lost connection to > heartbeat service. > Mar 3 14:47:10 S-FL2-PLS-NAC stonithd: [5287]: notice: > /usr/lib/heartbeat/stonithd normally quit. > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: mem_handle_func:IPC > broken, ccm is dead before the client! > Mar 3 14:47:10 S-FL2-PLS-NAC attrd: [5288]: CRIT: attrd_ha_dispatch: Lost > connection to heartbeat service. > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: ERROR: ccm_dispatch: CCM > connection appears to have failed: rc=-1. > Mar 3 14:47:10 S-FL2-PLS-NAC attrd: [5288]: CRIT: > attrd_ha_connection_destroy: Lost connection to heartbeat service! > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: ERROR: do_log: [[FSA]] Input > I_ERROR from ccm_dispatch() received in state (S_PENDING) > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_state_transition: > s-fl2-pls-nac.yardi.com: State transition S_PENDING -> S_RECOVERY [ > input=I_ERROR cause=C_CCM_CALLBACK origin=ccm_dispatch ] > Mar 3 14:47:10 S-FL2-PLS-NAC cib: [5285]: info: uninitializeCib: The CIB > has been deallocated. > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: ERROR: do_recover: Action > A_RECOVER (0000000001000000) not supported > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: ERROR: do_log: [[FSA]] Input > I_STOP from do_recover() received in state (S_RECOVERY) > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_state_transition: > s-fl2-pls-nac.yardi.com: State transition S_RECOVERY -> S_STOPPING [ > input=I_STOP cause=C_FSA_INTERNAL origin=do_recover ] > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_dc_release: DC role > released > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: WARN: do_log: [[FSA]] Input > I_RELEASE_SUCCESS from do_dc_release() received in state (S_STOPPING) > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_state_transition: > s-fl2-pls-nac.yardi.com: State transition S_STOPPING -> S_TERMINATE [ > input=I_TERMINATE cause=C_FSA_INTERNAL origin=do_shutdown ] > Mar 3 14:47:10 S-FL2-PLS-NAC attrd: [5288]: ERROR: cib_native_msgready: > Message pending on command channel [5285] > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: verify_stopped: Checking > for active resources before exit > Mar 3 14:47:10 S-FL2-PLS-NAC attrd: [5288]: ERROR: crm_log_message_adv: > #========= cib:cmd message start ==========# > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: verify_stopped: Checking > for active resources before exit > Mar 3 14:47:10 S-FL2-PLS-NAC attrd: [5288]: ERROR: MSG: No message to dump > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_lrm_control: > Disconnected from the LRM > Mar 3 14:47:10 S-FL2-PLS-NAC mgmtd: [5290]: ERROR: cib_native_msgready: > Message pending on command channel [5285] > Mar 3 14:47:10 S-FL2-PLS-NAC attrd: [5288]: info: cib_native_msgready: > Lost connection to the CIB service [5285]. > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_ha_control: > Disconnected from Heartbeat > Mar 3 14:47:10 S-FL2-PLS-NAC mgmtd: [5290]: ERROR: crm_log_message_adv: > #========= cib:cmd message start ==========# > Mar 3 14:47:10 S-FL2-PLS-NAC attrd: [5288]: CRIT: cib_native_dispatch: > Lost connection to the CIB service [5285/callback]. > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_cib_control: > Disconnecting CIB > Mar 3 14:47:10 S-FL2-PLS-NAC mgmtd: [5290]: ERROR: MSG: No message to dump > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: > crmd_cib_connection_destroy: Connection to the CIB terminated... > Mar 3 14:47:10 S-FL2-PLS-NAC mgmtd: [5290]: CRIT: cib_native_dispatch: > Lost connection to the CIB service [5285/callback]. > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_exit: Performing > A_EXIT_0 - gracefully exiting the CRMd > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: ERROR: do_exit: Could not > recover from internal error > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_exit: [crmd] stopped > (2) > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Emergency Shutdown: > Master Control process died. > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5057 > with SIGTERM > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5062 > with SIGTERM > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5063 > with SIGTERM > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5064 > with SIGTERM > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5065 > with SIGTERM > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5066 > with SIGTERM > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5067 > with SIGTERM > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5068 > with SIGTERM > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5069 > with SIGTERM > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Emergency > Shutdown(MCP dead): Killing ourselves. > Mar 3 16:00:12 S-FL2-PLS-NAC auditd[2341]: Audit daemon rotating log files > Mar 3 19:06:54 S-FL2-PLS-NAC auditd[2341]: Audit daemon rotating log files > > Logs from the second node are: > > Mar 14 19:38:13 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: Node > s-fl2-sls-nac.yardi.com: interval 23220 ms > Mar 14 19:38:36 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: Node > s-fl2-sls-nac.yardi.com: interval 23160 ms > Mar 14 19:38:59 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: Node > s-fl2-sls-nac.yardi.com: interval 23220 ms > Mar 14 19:39:22 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: Node > s-fl2-sls-nac.yardi.com: interval 23180 ms > Mar 14 19:39:45 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: Node > s-fl2-sls-nac.yardi.com: interval 23200 ms > Mar 14 19:40:08 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: Node > s-fl2-sls-nac.yardi.com: interval 23150 ms > Mar 14 19:40:32 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: Node > s-fl2-sls-nac.yardi.com: interval 23200 ms > <lots of these messages> > > Mar 14 19:41:18 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: Node > s-fl2-sls-nac.yardi.com: interval 23250 ms > Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: Node > s-fl2-sls-nac.yardi.com: interval 23580 ms > Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: info: Heartbeat restart on > node s-fl2-pls-nac.yardi.com > Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: info: Link > s-fl2-pls-nac.yardi.com:eth3 up. > Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: info: Status update for > node s-fl2-pls-nac.yardi.com: status init > Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: info: Link > s-fl2-pls-nac.yardi.com:eth1 up. > Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: info: Status update for > node s-fl2-pls-nac.yardi.com: status up > Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: ERROR: Message hist queue > is filling up (200 messages in queue) > Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: ERROR: Message hist queue > is filling up (200 messages in queue) > Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: info: all clients are now > paused > Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: ERROR: Message hist queue > is filling up (200 messages in queue) > Mar 14 19:41:42 S-FL2-SLS-NAC last message repeated 2 times > Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: info: Status update for > node s-fl2-pls-nac.yardi.com: status active > Mar 14 19:41:42 S-FL2-SLS-NAC cib: [2492]: info: > cib_client_status_callback: Status update: Client > s-fl2-pls-nac.yardi.com/cib now has status [join] Mar 14 19:41:42 > S-FL2-SLS-NAC heartbeat: [2411]: WARN: 1 lost packet(s) for > [s-fl2-pls-nac.yardi.com] [42:44] > Mar 14 19:41:42 S-FL2-SLS-NAC crmd: [2496]: notice: > crmd_ha_status_callback: Status update: Node s-fl2-pls-nac.yardi.com now > has status [init] > Mar 14 19:41:42 S-FL2-SLS-NAC crmd: [2496]: info: crmd_ha_status_callback: > Ping node s-fl2-pls-nac.yardi.com is init > Mar 14 19:41:42 S-FL2-SLS-NAC crmd: [2496]: notice: > crmd_ha_status_callback: Status update: Node s-fl2-pls-nac.yardi.com now > has status [up] > Mar 14 19:41:42 S-FL2-SLS-NAC crmd: [2496]: info: crmd_ha_status_callback: > Ping node s-fl2-pls-nac.yardi.com is up > Mar 14 19:41:42 S-FL2-SLS-NAC crmd: [2496]: notice: > crmd_ha_status_callback: Status update: Node s-fl2-pls-nac.yardi.com now > has status [active] Mar 14 19:41:42 S-FL2-SLS-NAC cib: [2492]: info: > cib_diff_notify: Local-only Change (client:2496, call: 175): 0.26.612 (ok) > Mar 14 19:41:42 S-FL2-SLS-NAC tengine: [10419]: info: te_update_diff: > Processing diff (cib_update): 0.26.612 -> 0.26.612 > Mar 14 19:41:42 S-FL2-SLS-NAC cib: [2991]: info: write_cib_contents: Wrote > version 0.26.612 of the CIB to disk (digest: > e9e9c5aebf16b1faf617dca58907fc8c) > Mar 14 19:41:43 S-FL2-SLS-NAC heartbeat: [2411]: ERROR: Message hist queue > is filling up (200 messages in queue) > Mar 14 19:41:43 S-FL2-SLS-NAC heartbeat: [2411]: info: No pkts missing from > s-fl2-pls-nac.yardi.com! > Mar 14 19:41:43 S-FL2-SLS-NAC heartbeat: [2411]: ERROR: Message hist queue > is filling up (200 messages in queue) > Mar 14 19:41:43 S-FL2-SLS-NAC heartbeat: [2411]: WARN: 1 lost packet(s) for > [s-fl2-pls-nac.yardi.com] [47:49] > Mar 14 19:41:43 S-FL2-SLS-NAC crmd: [2496]: notice: > crmd_client_status_callback: Status update: Client > s-fl2-pls-nac.yardi.com/crmd now has status [online] > Mar 14 19:41:43 S-FL2-SLS-NAC crmd: [2496]: info: > crmd_client_status_callback: Uncaching UUID for s-fl2-pls-nac.yardi.com > Mar 14 19:41:44 S-FL2-SLS-NAC heartbeat: [2411]: ERROR: Message hist queue > is filling up (200 messages in queue) > Mar 14 19:41:44 S-FL2-SLS-NAC heartbeat: [2411]: info: No pkts missing from > s-fl2-pls-nac.yardi.com! > Mar 14 19:41:44 S-FL2-SLS-NAC heartbeat: [2411]: ERROR: Message hist queue > is filling up (200 messages in queue) > Mar 14 19:41:44 S-FL2-SLS-NAC heartbeat: [2411]: info: all clients are now > resumed > Mar 14 19:41:44 S-FL2-SLS-NAC cib: [2492]: info: cib_process_readwrite: We > are now in R/O mode > Mar 14 19:41:44 S-FL2-SLS-NAC cib: [2492]: WARN: cib_process_diff: Diff > 0.26.600 -> 0.26.601 not applied to 0.26.612: current "num_updates" is > greater than required > Mar 14 19:41:44 S-FL2-SLS-NAC cib: [2492]: WARN: do_cib_notify: > cib_apply_diff of <diff > FAILED: Application of an update diff failed > Mar 14 19:41:44 S-FL2-SLS-NAC cib: [2492]: WARN: cib_process_request: > cib_apply_diff operation failed: Application of an update diff failed > Mar 14 19:41:44 S-FL2-SLS-NAC cib: [2492]: WARN: cib_process_replace: > Replacement 0.26.601 not applied to 0.26.612: current num_updates is > greater than the replacement > > <lots of these messages> > > Mar 14 19:41:47 S-FL2-SLS-NAC ccm: [2491]: ERROR: Lost connection to > heartbeat service. Need to bail out. > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: get_uuid: > get_uuid_by_name() call failed for host s-fl2-pls-nac.yardi.com > Mar 14 19:41:47 S-FL2-SLS-NAC cib: [2492]: ERROR: > cib_ha_connection_destroy: Heartbeat connection lost! Exiting. > Mar 14 19:41:47 S-FL2-SLS-NAC cib: [2492]: info: uninitializeCib: The CIB > has been deallocated. > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_state_transition: > s-fl2-sls-nac.yardi.com: State transition S_IDLE -> S_INTEGRATION [ > input=I_NODE_JOIN cause=C_HA_MESSAGE origin=route_message ] > Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: info: > update_abort_priority: Abort priority upgraded to 1000000 > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: update_dc: Set DC to > <null> (<null>) > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: send_ha_message: Not > connected to Heartbeat > Mar 14 19:41:47 S-FL2-SLS-NAC stonithd: [2494]: ERROR: Disconnected with > heartbeat daemon > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: crm_log_message_adv: > #========= HA[outbound] message start ==========# > Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: ERROR: cib_native_msgready: > Message pending on command channel [2492] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG: Dumping message with > 10 fields > Mar 14 19:41:47 S-FL2-SLS-NAC mgmtd: [2497]: ERROR: cib_native_msgready: > Message pending on command channel [2492] > Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: ERROR: crm_log_message_adv: > #========= cib:cmd message start ==========# > Mar 14 19:41:47 S-FL2-SLS-NAC attrd: [2495]: CRIT: > attrd_ha_connection_destroy: Lost connection to heartbeat service! > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[0] : > [origin=join_make_offer] > Mar 14 19:41:47 S-FL2-SLS-NAC mgmtd: [2497]: ERROR: crm_log_message_adv: > #========= cib:cmd message start ==========# > Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: ERROR: MSG: No message to > dump > Mar 14 19:41:47 S-FL2-SLS-NAC attrd: [2495]: ERROR: cib_native_msgready: > Message pending on command channel [2492] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[1] : [t=crmd] > Mar 14 19:41:47 S-FL2-SLS-NAC mgmtd: [2497]: ERROR: MSG: No message to dump > Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: info: cib_native_msgready: > Lost connection to the CIB service [2492]. > Mar 14 19:41:47 S-FL2-SLS-NAC attrd: [2495]: ERROR: crm_log_message_adv: > #========= cib:cmd message start ==========# > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[2] : [version=1.0.7] > Mar 14 19:41:47 S-FL2-SLS-NAC attrd: [2495]: ERROR: MSG: No message to dump > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[3] : [subt=request] > Mar 14 19:41:47 S-FL2-SLS-NAC attrd: [2495]: info: cib_native_msgready: > Lost connection to the CIB service [2492]. > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[4] : > [reference=join_offer-dc-1205503907-113] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[5] : > [crm_task=join_offer] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[6] : > [crm_sys_to=crmd] Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[7] > : [crm_sys_from=dc] Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: > MSG[8] : [crm_host_to= s-fl2-pls-nac.yardi.com] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[9] : [join_id=8] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: send_msg_via_ha: Sending > directed HA message (ref=join_offer-dc-1205503907-113) to > [EMAIL PROTECTED] failed. > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: send_ha_message: Not > connected to Heartbeat > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: crm_log_message_adv: > #========= HA[outbound] message start ==========# > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG: Dumping message with > 10 fields > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[0] : > [origin=join_make_offer] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[1] : [t=crmd] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[2] : [version=1.0.7] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[3] : [subt=request] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[4] : > [reference=join_offer-dc-1205503907-114] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[5] : > [crm_task=join_offer] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[6] : > [crm_sys_to=crmd] Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[7] > : [crm_sys_from=dc] Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: > MSG[8] : [crm_host_to= s-fl2-sls-nac.yardi.com] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[9] : [join_id=8] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: send_msg_via_ha: Sending > directed HA message (ref=join_offer-dc-1205503907-114) to > [EMAIL PROTECTED] failed. > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_dc_join_offer_all: > join-8: Waiting on 2 outstanding join acks > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_election_count_vote: > Election check: vote from s-fl2-pls-nac.yardi.com > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_election_count_vote: > Election won over s-fl2-pls-nac.yardi.com > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_state_transition: > s-fl2-sls-nac.yardi.com: State transition S_INTEGRATION -> S_ELECTION [ > input=I_ELECTION cause=C_FSA_INTERNAL origin=do_election_count_vote ] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: update_dc: Set DC to > <null> (<null>) > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: send_ha_message: Not > connected to Heartbeat > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: crm_log_message_adv: > #========= HA[outbound] message start ==========# > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG: Dumping message with > 10 fields > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[0] : > [origin=do_election_vote] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[1] : [t=crmd] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[2] : [version=1.0.7] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[3] : [subt=request] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[4] : > [reference=vote-crmd-1205503907-115] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[5] : [crm_task=vote] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[6] : > [crm_sys_to=crmd] Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[7] > : > [crm_sys_from=crmd] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[8] : > [election-owner=a5ea4881-0e06-4ea3-83a9-1d0f2184109d] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[9] : [election-id=4] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: send_msg_via_ha: Sending > broadcast HA message (ref=vote-crmd-1205503907-115) to crmd@<all> failed. > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: populate_cib_nodes: > Requesting the list of configured nodes > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: get_uuid: > get_uuid_by_name() call failed for host s-fl2-pls-nac.yardi.com > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: crm_abort: > add_node_copy: Triggered non-fatal assert at xml.c:281 : src_node != NULL > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: > crmd_cib_connection_destroy: Connection to the CIB terminated... > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: send_ha_message: Not > connected to Heartbeat > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: crm_log_message_adv: > #========= HA[outbound] message start ==========# > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG: Dumping message with > 10 fields > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[0] : > [origin=do_election_vote] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[1] : [t=crmd] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[2] : [version=1.0.7] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[3] : [subt=request] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[4] : > [reference=vote-crmd-1205503907-116] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[5] : [crm_task=vote] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[6] : > [crm_sys_to=crmd] Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[7] > : > [crm_sys_from=crmd] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[8] : > [election-owner=a5ea4881-0e06-4ea3-83a9-1d0f2184109d] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[9] : [election-id=5] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: send_msg_via_ha: Sending > broadcast HA message (ref=vote-crmd-1205503907-116) to crmd@<all> failed. > Mar 14 19:41:47 S-FL2-SLS-NAC mgmtd: [2497]: ERROR: Lost connection to > heartbeat service. > Mar 14 19:41:47 S-FL2-SLS-NAC stonithd: [2494]: notice: > /usr/lib/heartbeat/stonithd normally quit. > Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: ERROR: > stonithd_op_result_ready: failed due to not on signon status. > Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: ERROR: > tengine_stonith_connection_destroy: Fencing daemon has left us > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: do_log: [[FSA]] Input > I_ERROR from crmd_cib_connection_destroy() received in state (S_ELECTION) > Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: info: > update_abort_priority: Abort action 2 superceeded by 3 > Mar 14 19:41:47 S-FL2-SLS-NAC pengine: [10420]: info: pengine_shutdown: > Exiting PEngine (SIGTERM) > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_state_transition: > s-fl2-sls-nac.yardi.com: State transition S_ELECTION -> S_RECOVERY [ > input=I_ERROR cause=C_FSA_INTERNAL origin=crmd_cib_connection_destroy ] > Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: info: notify_crmd: Exiting > after transition > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: do_recover: Action > A_RECOVER (0000000001000000) not supported > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_dc_release: DC role > released > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent > -TERM to pengine: [10420] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent > -TERM to tengine: [10419] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: do_log: [[FSA]] Input > I_STOP from do_recover() received in state (S_RECOVERY) > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_state_transition: > s-fl2-sls-nac.yardi.com: State transition S_RECOVERY -> S_STOPPING [ > input=I_STOP cause=C_FSA_INTERNAL origin=do_recover ] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_dc_release: DC role > released > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent > -TERM to pengine: [10420] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent > -TERM to tengine: [10419] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: Terminating > the pengine > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent > -TERM to pengine: [10420] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: Terminating > the tengine > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent > -TERM to tengine: [10419] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: Waiting for > subsystems to exit > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: register_fsa_input_adv: > do_shutdown stalled the FSA with pending inputs > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: do_log: [[FSA]] Input > I_RELEASE_SUCCESS from do_dc_release() received in state (S_STOPPING) > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: Terminating > the pengine > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent > -TERM to pengine: [10420] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: Terminating > the tengine > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent > -TERM to tengine: [10419] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: Waiting for > subsystems to exit > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: register_fsa_input_adv: > do_shutdown stalled the FSA with pending inputs > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: mem_handle_func:IPC > broken, ccm is dead before the client! > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: do_log: [[FSA]] Input > I_RELEASE_SUCCESS from do_dc_release() received in state (S_STOPPING) > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: Terminating > the pengine > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent > -TERM to pengine: [10420] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: Terminating > the tengine > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent > -TERM to tengine: [10419] > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: Waiting for > subsystems to exit > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: verify_stopped: Checking > for active resources before exit > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: 10 > pending LRM operations at shutdown > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending: > Pending action: Event-Gateway:25 > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending: > Pending action: Policy-Manager:41 > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending: > Pending action: Event-Correlation:39 > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending: > Pending action: Check-Drives:13 > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending: > Pending action: IPaddr_corp:19 > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending: > Pending action: Master-Database:21 > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending: > Pending action: IPaddr_mgmt:15 > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending: > Pending action: IPaddr_log:17 > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending: > Pending action: Events-Database:23 > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending: > Pending action: Admin-Notify:43 > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: Resource > Events-Database was active at shutdown. You may ignore this error if it is > unmanaged. > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: Resource > Event-Gateway was active at shutdown. You may ignore this error if it is > unmanaged. > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: Resource > Policy-Manager was active at shutdown. You may ignore this error if it is > unmanaged. > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: Resource > IPaddr_corp was active at shutdown. You may ignore this error if it is > unmanaged. > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: Resource > IPaddr_mgmt was active at shutdown. You may ignore this error if it is > unmanaged. > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: Resource > IPaddr_log was active at shutdown. You may ignore this error if it is > unmanaged. > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: Resource > Master-Database was active at shutdown. You may ignore this error if it is > unmanaged. > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: Resource > Event-Correlation was active at shutdown. You may ignore this error if it > is unmanaged. > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: Resource > Check-Drives was active at shutdown. You may ignore this error if it is > unmanaged. > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: Resource > Admin-Notify was active at shutdown. You may ignore this error if it is > unmanaged. > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: do_exit: Performing > A_EXIT_1 - forcefully exiting the CRMd > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: do_exit: Could not > recover from internal error > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_exit: [crmd] stopped > (2) > Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Emergency Shutdown: > Master Control process died. > Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2411 > with SIGTERM > Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2443 > with SIGTERM > Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2444 > with SIGTERM > Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2445 > with SIGTERM > Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2446 > with SIGTERM > Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2447 > with SIGTERM > Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2448 > with SIGTERM > Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2449 > with SIGTERM > Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2450 > with SIGTERM > Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Emergency > Shutdown(MCP dead): Killing ourselves. > Mar 14 23:02:12 S-FL2-SLS-NAC auditd[2292]: Audit daemon rotating log files > Mar 15 07:22:19 S-FL2-SLS-NAC auditd[2292]: Audit daemon rotating log files > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
