Hi Nikita, I have checked that. On the same system, it worked fine for several months. And after restarts, it's again working fine.
Manas On Wed, Mar 19, 2008 at 3:53 PM, Nikita Michalko <[EMAIL PROTECTED]> wrote: > Hi Manas, > > can you check if your local firewall is blocking access to port 649 (or > another HA-used port) ? Maybe some networking problems ? > > HTH > > Nikita Michalko > > > Am Montag, 17. März 2008 14:15 schrieb Manas Garg: > > Hi, > > > > We have a two nodes setup running heartbeat version 2.0.8-1. On one > node, > > heartbeat exited saying Emergency Shutdown. It was restarted. After the > > restart, the heartbeat on the other node exited giving roughly the same > > reason. Can someone please help us identify the issue. If these are > known > > bugs and if those bugs have been fixed in later releases? > > > > Any help would be greatly appreciated. > > > > The nodes configuration: > > > > sh-3.00# uname -a > > Linux S-FL2-PLS-NAC 2.6.17-1.2142_FC4smp #1 SMP Sat Aug 12 08:16:08 EDT > > 2006 i686 i686 i386 GNU/Linux > > > > Following are the logs from the first node: > > > > Mar 3 14:47:05 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Message hist > queue > > is filling up (197 messages in queue) > > Mar 3 14:47:05 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Message hist > queue > > is filling up (198 messages in queue) > > Mar 3 14:47:06 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Message hist > queue > > is filling up (199 messages in queue) > > Mar 3 14:47:06 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Message hist > queue > > is filling up (200 messages in queue) > > Mar 3 14:47:10 S-FL2-PLS-NAC last message repeated 7 times > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit > pkt 7 > > for s-fl2-sls-nac.yardi.com: seqno too low > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode = > > s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0 > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information: > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =207, > > lowseq=7,ackseq=0,lastmsg=6 > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit > pkt 7 > > for s-fl2-sls-nac.yardi.com: seqno too low > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode = > > s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0 > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information: > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =207, > > lowseq=7,ackseq=0,lastmsg=6 > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Message hist > queue > > is filling up (200 messages in queue) > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit > pkt 8 > > for s-fl2-sls-nac.yardi.com: seqno too low > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode = > > s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0 > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information: > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =208, > > lowseq=8,ackseq=0,lastmsg=7 > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit > pkt 8 > > for s-fl2-sls-nac.yardi.com: seqno too low > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode = > > s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0 > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information: > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =208, > > lowseq=8,ackseq=0,lastmsg=7 > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit > pkt 8 > > for s-fl2-sls-nac.yardi.com: seqno too low > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode = > > s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0 > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information: > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =208, > > lowseq=8,ackseq=0,lastmsg=7 > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit > pkt 8 > > for s-fl2-sls-nac.yardi.com: seqno too low > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode = > > s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0 > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information: > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =208, > > lowseq=8,ackseq=0,lastmsg=7 > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: lowseq cannnot > be > > greater than ackseq > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist->ackseq =10, > > old_ackseq=0 > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist->lowseq > =201, > > hist->hiseq=208, send_cluster_msg_level=0 > > Mar 3 14:47:10 S-FL2-PLS-NAC ccm: [5284]: ERROR: Lost connection to > > heartbeat service. Need to bail out. > > Mar 3 14:47:10 S-FL2-PLS-NAC cib: [5285]: ERROR: > > cib_ha_connection_destroy: Heartbeat connection lost! Exiting. > > Mar 3 14:47:10 S-FL2-PLS-NAC stonithd: [5287]: ERROR: Disconnected with > > heartbeat daemon > > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: CRIT: crmd_ha_msg_dispatch: > > Lost connection to heartbeat service. > > Mar 3 14:47:10 S-FL2-PLS-NAC mgmtd: [5290]: ERROR: Lost connection to > > heartbeat service. > > Mar 3 14:47:10 S-FL2-PLS-NAC stonithd: [5287]: notice: > > /usr/lib/heartbeat/stonithd normally quit. > > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: mem_handle_func:IPC > > broken, ccm is dead before the client! > > Mar 3 14:47:10 S-FL2-PLS-NAC attrd: [5288]: CRIT: attrd_ha_dispatch: > Lost > > connection to heartbeat service. > > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: ERROR: ccm_dispatch: CCM > > connection appears to have failed: rc=-1. > > Mar 3 14:47:10 S-FL2-PLS-NAC attrd: [5288]: CRIT: > > attrd_ha_connection_destroy: Lost connection to heartbeat service! > > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: ERROR: do_log: [[FSA]] Input > > I_ERROR from ccm_dispatch() received in state (S_PENDING) > > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_state_transition: > > s-fl2-pls-nac.yardi.com: State transition S_PENDING -> S_RECOVERY [ > > input=I_ERROR cause=C_CCM_CALLBACK origin=ccm_dispatch ] > > Mar 3 14:47:10 S-FL2-PLS-NAC cib: [5285]: info: uninitializeCib: The > CIB > > has been deallocated. > > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: ERROR: do_recover: Action > > A_RECOVER (0000000001000000) not supported > > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: ERROR: do_log: [[FSA]] Input > > I_STOP from do_recover() received in state (S_RECOVERY) > > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_state_transition: > > s-fl2-pls-nac.yardi.com: State transition S_RECOVERY -> S_STOPPING [ > > input=I_STOP cause=C_FSA_INTERNAL origin=do_recover ] > > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_dc_release: DC role > > released > > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: WARN: do_log: [[FSA]] Input > > I_RELEASE_SUCCESS from do_dc_release() received in state (S_STOPPING) > > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_state_transition: > > s-fl2-pls-nac.yardi.com: State transition S_STOPPING -> S_TERMINATE [ > > input=I_TERMINATE cause=C_FSA_INTERNAL origin=do_shutdown ] > > Mar 3 14:47:10 S-FL2-PLS-NAC attrd: [5288]: ERROR: cib_native_msgready: > > Message pending on command channel [5285] > > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: verify_stopped: > Checking > > for active resources before exit > > Mar 3 14:47:10 S-FL2-PLS-NAC attrd: [5288]: ERROR: crm_log_message_adv: > > #========= cib:cmd message start ==========# > > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: verify_stopped: > Checking > > for active resources before exit > > Mar 3 14:47:10 S-FL2-PLS-NAC attrd: [5288]: ERROR: MSG: No message to > dump > > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_lrm_control: > > Disconnected from the LRM > > Mar 3 14:47:10 S-FL2-PLS-NAC mgmtd: [5290]: ERROR: cib_native_msgready: > > Message pending on command channel [5285] > > Mar 3 14:47:10 S-FL2-PLS-NAC attrd: [5288]: info: cib_native_msgready: > > Lost connection to the CIB service [5285]. > > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_ha_control: > > Disconnected from Heartbeat > > Mar 3 14:47:10 S-FL2-PLS-NAC mgmtd: [5290]: ERROR: crm_log_message_adv: > > #========= cib:cmd message start ==========# > > Mar 3 14:47:10 S-FL2-PLS-NAC attrd: [5288]: CRIT: cib_native_dispatch: > > Lost connection to the CIB service [5285/callback]. > > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_cib_control: > > Disconnecting CIB > > Mar 3 14:47:10 S-FL2-PLS-NAC mgmtd: [5290]: ERROR: MSG: No message to > dump > > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: > > crmd_cib_connection_destroy: Connection to the CIB terminated... > > Mar 3 14:47:10 S-FL2-PLS-NAC mgmtd: [5290]: CRIT: cib_native_dispatch: > > Lost connection to the CIB service [5285/callback]. > > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_exit: Performing > > A_EXIT_0 - gracefully exiting the CRMd > > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: ERROR: do_exit: Could not > > recover from internal error > > Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: do_exit: [crmd] > stopped > > (2) > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Emergency > Shutdown: > > Master Control process died. > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5057 > > with SIGTERM > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5062 > > with SIGTERM > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5063 > > with SIGTERM > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5064 > > with SIGTERM > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5065 > > with SIGTERM > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5066 > > with SIGTERM > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5067 > > with SIGTERM > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5068 > > with SIGTERM > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Killing pid 5069 > > with SIGTERM > > Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5061]: CRIT: Emergency > > Shutdown(MCP dead): Killing ourselves. > > Mar 3 16:00:12 S-FL2-PLS-NAC auditd[2341]: Audit daemon rotating log > files > > Mar 3 19:06:54 S-FL2-PLS-NAC auditd[2341]: Audit daemon rotating log > files > > > > Logs from the second node are: > > > > Mar 14 19:38:13 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: > Node > > s-fl2-sls-nac.yardi.com: interval 23220 ms > > Mar 14 19:38:36 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: > Node > > s-fl2-sls-nac.yardi.com: interval 23160 ms > > Mar 14 19:38:59 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: > Node > > s-fl2-sls-nac.yardi.com: interval 23220 ms > > Mar 14 19:39:22 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: > Node > > s-fl2-sls-nac.yardi.com: interval 23180 ms > > Mar 14 19:39:45 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: > Node > > s-fl2-sls-nac.yardi.com: interval 23200 ms > > Mar 14 19:40:08 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: > Node > > s-fl2-sls-nac.yardi.com: interval 23150 ms > > Mar 14 19:40:32 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: > Node > > s-fl2-sls-nac.yardi.com: interval 23200 ms > > <lots of these messages> > > > > Mar 14 19:41:18 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: > Node > > s-fl2-sls-nac.yardi.com: interval 23250 ms > > Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: WARN: Late heartbeat: > Node > > s-fl2-sls-nac.yardi.com: interval 23580 ms > > Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: info: Heartbeat restart > on > > node s-fl2-pls-nac.yardi.com > > Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: info: Link > > s-fl2-pls-nac.yardi.com:eth3 up. > > Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: info: Status update for > > node s-fl2-pls-nac.yardi.com: status init > > Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: info: Link > > s-fl2-pls-nac.yardi.com:eth1 up. > > Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: info: Status update for > > node s-fl2-pls-nac.yardi.com: status up > > Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: ERROR: Message hist > queue > > is filling up (200 messages in queue) > > Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: ERROR: Message hist > queue > > is filling up (200 messages in queue) > > Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: info: all clients are > now > > paused > > Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: ERROR: Message hist > queue > > is filling up (200 messages in queue) > > Mar 14 19:41:42 S-FL2-SLS-NAC last message repeated 2 times > > Mar 14 19:41:42 S-FL2-SLS-NAC heartbeat: [2411]: info: Status update for > > node s-fl2-pls-nac.yardi.com: status active > > Mar 14 19:41:42 S-FL2-SLS-NAC cib: [2492]: info: > > cib_client_status_callback: Status update: Client > > s-fl2-pls-nac.yardi.com/cib now has status [join] Mar 14 19:41:42 > > S-FL2-SLS-NAC heartbeat: [2411]: WARN: 1 lost packet(s) for > > [s-fl2-pls-nac.yardi.com] [42:44] > > Mar 14 19:41:42 S-FL2-SLS-NAC crmd: [2496]: notice: > > crmd_ha_status_callback: Status update: Node s-fl2-pls-nac.yardi.com now > > has status [init] > > Mar 14 19:41:42 S-FL2-SLS-NAC crmd: [2496]: info: > crmd_ha_status_callback: > > Ping node s-fl2-pls-nac.yardi.com is init > > Mar 14 19:41:42 S-FL2-SLS-NAC crmd: [2496]: notice: > > crmd_ha_status_callback: Status update: Node s-fl2-pls-nac.yardi.com now > > has status [up] > > Mar 14 19:41:42 S-FL2-SLS-NAC crmd: [2496]: info: > crmd_ha_status_callback: > > Ping node s-fl2-pls-nac.yardi.com is up > > Mar 14 19:41:42 S-FL2-SLS-NAC crmd: [2496]: notice: > > crmd_ha_status_callback: Status update: Node s-fl2-pls-nac.yardi.com now > > has status [active] Mar 14 19:41:42 S-FL2-SLS-NAC cib: [2492]: info: > > cib_diff_notify: Local-only Change (client:2496, call: 175): 0.26.612(ok) > > Mar 14 19:41:42 S-FL2-SLS-NAC tengine: [10419]: info: te_update_diff: > > Processing diff (cib_update): 0.26.612 -> 0.26.612 > > Mar 14 19:41:42 S-FL2-SLS-NAC cib: [2991]: info: write_cib_contents: > Wrote > > version 0.26.612 of the CIB to disk (digest: > > e9e9c5aebf16b1faf617dca58907fc8c) > > Mar 14 19:41:43 S-FL2-SLS-NAC heartbeat: [2411]: ERROR: Message hist > queue > > is filling up (200 messages in queue) > > Mar 14 19:41:43 S-FL2-SLS-NAC heartbeat: [2411]: info: No pkts missing > from > > s-fl2-pls-nac.yardi.com! > > Mar 14 19:41:43 S-FL2-SLS-NAC heartbeat: [2411]: ERROR: Message hist > queue > > is filling up (200 messages in queue) > > Mar 14 19:41:43 S-FL2-SLS-NAC heartbeat: [2411]: WARN: 1 lost packet(s) > for > > [s-fl2-pls-nac.yardi.com] [47:49] > > Mar 14 19:41:43 S-FL2-SLS-NAC crmd: [2496]: notice: > > crmd_client_status_callback: Status update: Client > > s-fl2-pls-nac.yardi.com/crmd now has status [online] > > Mar 14 19:41:43 S-FL2-SLS-NAC crmd: [2496]: info: > > crmd_client_status_callback: Uncaching UUID for s-fl2-pls-nac.yardi.com > > Mar 14 19:41:44 S-FL2-SLS-NAC heartbeat: [2411]: ERROR: Message hist > queue > > is filling up (200 messages in queue) > > Mar 14 19:41:44 S-FL2-SLS-NAC heartbeat: [2411]: info: No pkts missing > from > > s-fl2-pls-nac.yardi.com! > > Mar 14 19:41:44 S-FL2-SLS-NAC heartbeat: [2411]: ERROR: Message hist > queue > > is filling up (200 messages in queue) > > Mar 14 19:41:44 S-FL2-SLS-NAC heartbeat: [2411]: info: all clients are > now > > resumed > > Mar 14 19:41:44 S-FL2-SLS-NAC cib: [2492]: info: cib_process_readwrite: > We > > are now in R/O mode > > Mar 14 19:41:44 S-FL2-SLS-NAC cib: [2492]: WARN: cib_process_diff: Diff > > 0.26.600 -> 0.26.601 not applied to 0.26.612: current "num_updates" is > > greater than required > > Mar 14 19:41:44 S-FL2-SLS-NAC cib: [2492]: WARN: do_cib_notify: > > cib_apply_diff of <diff > FAILED: Application of an update diff failed > > Mar 14 19:41:44 S-FL2-SLS-NAC cib: [2492]: WARN: cib_process_request: > > cib_apply_diff operation failed: Application of an update diff failed > > Mar 14 19:41:44 S-FL2-SLS-NAC cib: [2492]: WARN: cib_process_replace: > > Replacement 0.26.601 not applied to 0.26.612: current num_updates is > > greater than the replacement > > > > <lots of these messages> > > > > Mar 14 19:41:47 S-FL2-SLS-NAC ccm: [2491]: ERROR: Lost connection to > > heartbeat service. Need to bail out. > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: get_uuid: > > get_uuid_by_name() call failed for host s-fl2-pls-nac.yardi.com > > Mar 14 19:41:47 S-FL2-SLS-NAC cib: [2492]: ERROR: > > cib_ha_connection_destroy: Heartbeat connection lost! Exiting. > > Mar 14 19:41:47 S-FL2-SLS-NAC cib: [2492]: info: uninitializeCib: The > CIB > > has been deallocated. > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_state_transition: > > s-fl2-sls-nac.yardi.com: State transition S_IDLE -> S_INTEGRATION [ > > input=I_NODE_JOIN cause=C_HA_MESSAGE origin=route_message ] > > Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: info: > > update_abort_priority: Abort priority upgraded to 1000000 > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: update_dc: Set DC to > > <null> (<null>) > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: send_ha_message: Not > > connected to Heartbeat > > Mar 14 19:41:47 S-FL2-SLS-NAC stonithd: [2494]: ERROR: Disconnected with > > heartbeat daemon > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: crm_log_message_adv: > > #========= HA[outbound] message start ==========# > > Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: ERROR: > cib_native_msgready: > > Message pending on command channel [2492] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG: Dumping message > with > > 10 fields > > Mar 14 19:41:47 S-FL2-SLS-NAC mgmtd: [2497]: ERROR: cib_native_msgready: > > Message pending on command channel [2492] > > Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: ERROR: > crm_log_message_adv: > > #========= cib:cmd message start ==========# > > Mar 14 19:41:47 S-FL2-SLS-NAC attrd: [2495]: CRIT: > > attrd_ha_connection_destroy: Lost connection to heartbeat service! > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[0] : > > [origin=join_make_offer] > > Mar 14 19:41:47 S-FL2-SLS-NAC mgmtd: [2497]: ERROR: crm_log_message_adv: > > #========= cib:cmd message start ==========# > > Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: ERROR: MSG: No message > to > > dump > > Mar 14 19:41:47 S-FL2-SLS-NAC attrd: [2495]: ERROR: cib_native_msgready: > > Message pending on command channel [2492] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[1] : [t=crmd] > > Mar 14 19:41:47 S-FL2-SLS-NAC mgmtd: [2497]: ERROR: MSG: No message to > dump > > Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: info: > cib_native_msgready: > > Lost connection to the CIB service [2492]. > > Mar 14 19:41:47 S-FL2-SLS-NAC attrd: [2495]: ERROR: crm_log_message_adv: > > #========= cib:cmd message start ==========# > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[2] : [version= > 1.0.7] > > Mar 14 19:41:47 S-FL2-SLS-NAC attrd: [2495]: ERROR: MSG: No message to > dump > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[3] : > [subt=request] > > Mar 14 19:41:47 S-FL2-SLS-NAC attrd: [2495]: info: cib_native_msgready: > > Lost connection to the CIB service [2492]. > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[4] : > > [reference=join_offer-dc-1205503907-113] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[5] : > > [crm_task=join_offer] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[6] : > > [crm_sys_to=crmd] Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: > MSG[7] > > : [crm_sys_from=dc] Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: > > MSG[8] : [crm_host_to= s-fl2-pls-nac.yardi.com] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[9] : [join_id=8] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: send_msg_via_ha: > Sending > > directed HA message (ref=join_offer-dc-1205503907-113) to > > [EMAIL PROTECTED] failed. > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: send_ha_message: Not > > connected to Heartbeat > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: crm_log_message_adv: > > #========= HA[outbound] message start ==========# > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG: Dumping message > with > > 10 fields > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[0] : > > [origin=join_make_offer] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[1] : [t=crmd] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[2] : [version= > 1.0.7] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[3] : > [subt=request] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[4] : > > [reference=join_offer-dc-1205503907-114] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[5] : > > [crm_task=join_offer] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[6] : > > [crm_sys_to=crmd] Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: > MSG[7] > > : [crm_sys_from=dc] Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: > > MSG[8] : [crm_host_to= s-fl2-sls-nac.yardi.com] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[9] : [join_id=8] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: send_msg_via_ha: > Sending > > directed HA message (ref=join_offer-dc-1205503907-114) to > > [EMAIL PROTECTED] failed. > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_dc_join_offer_all: > > join-8: Waiting on 2 outstanding join acks > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: > do_election_count_vote: > > Election check: vote from s-fl2-pls-nac.yardi.com > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: > do_election_count_vote: > > Election won over s-fl2-pls-nac.yardi.com > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_state_transition: > > s-fl2-sls-nac.yardi.com: State transition S_INTEGRATION -> S_ELECTION [ > > input=I_ELECTION cause=C_FSA_INTERNAL origin=do_election_count_vote ] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: update_dc: Set DC to > > <null> (<null>) > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: send_ha_message: Not > > connected to Heartbeat > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: crm_log_message_adv: > > #========= HA[outbound] message start ==========# > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG: Dumping message > with > > 10 fields > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[0] : > > [origin=do_election_vote] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[1] : [t=crmd] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[2] : [version= > 1.0.7] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[3] : > [subt=request] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[4] : > > [reference=vote-crmd-1205503907-115] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[5] : > [crm_task=vote] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[6] : > > [crm_sys_to=crmd] Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: > MSG[7] > > : > > [crm_sys_from=crmd] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[8] : > > [election-owner=a5ea4881-0e06-4ea3-83a9-1d0f2184109d] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[9] : > [election-id=4] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: send_msg_via_ha: > Sending > > broadcast HA message (ref=vote-crmd-1205503907-115) to crmd@<all> > failed. > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: populate_cib_nodes: > > Requesting the list of configured nodes > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: get_uuid: > > get_uuid_by_name() call failed for host s-fl2-pls-nac.yardi.com > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: crm_abort: > > add_node_copy: Triggered non-fatal assert at xml.c:281 : src_node != > NULL > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: > > crmd_cib_connection_destroy: Connection to the CIB terminated... > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: send_ha_message: Not > > connected to Heartbeat > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: crm_log_message_adv: > > #========= HA[outbound] message start ==========# > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG: Dumping message > with > > 10 fields > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[0] : > > [origin=do_election_vote] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[1] : [t=crmd] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[2] : [version= > 1.0.7] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[3] : > [subt=request] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[4] : > > [reference=vote-crmd-1205503907-116] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[5] : > [crm_task=vote] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[6] : > > [crm_sys_to=crmd] Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: > MSG[7] > > : > > [crm_sys_from=crmd] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[8] : > > [election-owner=a5ea4881-0e06-4ea3-83a9-1d0f2184109d] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: MSG[9] : > [election-id=5] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: send_msg_via_ha: > Sending > > broadcast HA message (ref=vote-crmd-1205503907-116) to crmd@<all> > failed. > > Mar 14 19:41:47 S-FL2-SLS-NAC mgmtd: [2497]: ERROR: Lost connection to > > heartbeat service. > > Mar 14 19:41:47 S-FL2-SLS-NAC stonithd: [2494]: notice: > > /usr/lib/heartbeat/stonithd normally quit. > > Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: ERROR: > > stonithd_op_result_ready: failed due to not on signon status. > > Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: ERROR: > > tengine_stonith_connection_destroy: Fencing daemon has left us > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: do_log: [[FSA]] Input > > I_ERROR from crmd_cib_connection_destroy() received in state > (S_ELECTION) > > Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: info: > > update_abort_priority: Abort action 2 superceeded by 3 > > Mar 14 19:41:47 S-FL2-SLS-NAC pengine: [10420]: info: pengine_shutdown: > > Exiting PEngine (SIGTERM) > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_state_transition: > > s-fl2-sls-nac.yardi.com: State transition S_ELECTION -> S_RECOVERY [ > > input=I_ERROR cause=C_FSA_INTERNAL origin=crmd_cib_connection_destroy ] > > Mar 14 19:41:47 S-FL2-SLS-NAC tengine: [10419]: info: notify_crmd: > Exiting > > after transition > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: do_recover: Action > > A_RECOVER (0000000001000000) not supported > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_dc_release: DC role > > released > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent > > -TERM to pengine: [10420] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent > > -TERM to tengine: [10419] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: do_log: [[FSA]] Input > > I_STOP from do_recover() received in state (S_RECOVERY) > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_state_transition: > > s-fl2-sls-nac.yardi.com: State transition S_RECOVERY -> S_STOPPING [ > > input=I_STOP cause=C_FSA_INTERNAL origin=do_recover ] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_dc_release: DC role > > released > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent > > -TERM to pengine: [10420] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent > > -TERM to tengine: [10419] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: > Terminating > > the pengine > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent > > -TERM to pengine: [10420] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: > Terminating > > the tengine > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent > > -TERM to tengine: [10419] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: Waiting > for > > subsystems to exit > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: > register_fsa_input_adv: > > do_shutdown stalled the FSA with pending inputs > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: do_log: [[FSA]] Input > > I_RELEASE_SUCCESS from do_dc_release() received in state (S_STOPPING) > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: > Terminating > > the pengine > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent > > -TERM to pengine: [10420] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: > Terminating > > the tengine > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent > > -TERM to tengine: [10419] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: Waiting > for > > subsystems to exit > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: > register_fsa_input_adv: > > do_shutdown stalled the FSA with pending inputs > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: mem_handle_func:IPC > > broken, ccm is dead before the client! > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: WARN: do_log: [[FSA]] Input > > I_RELEASE_SUCCESS from do_dc_release() received in state (S_STOPPING) > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: > Terminating > > the pengine > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent > > -TERM to pengine: [10420] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: > Terminating > > the tengine > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: stop_subsystem: Sent > > -TERM to tengine: [10419] > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_shutdown: Waiting > for > > subsystems to exit > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: verify_stopped: > Checking > > for active resources before exit > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: 10 > > pending LRM operations at shutdown > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending: > > Pending action: Event-Gateway:25 > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending: > > Pending action: Policy-Manager:41 > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending: > > Pending action: Event-Correlation:39 > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending: > > Pending action: Check-Drives:13 > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending: > > Pending action: IPaddr_corp:19 > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending: > > Pending action: Master-Database:21 > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending: > > Pending action: IPaddr_mgmt:15 > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending: > > Pending action: IPaddr_log:17 > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending: > > Pending action: Events-Database:23 > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: ghash_print_pending: > > Pending action: Admin-Notify:43 > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: > Resource > > Events-Database was active at shutdown. You may ignore this error if it > is > > unmanaged. > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: > Resource > > Event-Gateway was active at shutdown. You may ignore this error if it > is > > unmanaged. > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: > Resource > > Policy-Manager was active at shutdown. You may ignore this error if it > is > > unmanaged. > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: > Resource > > IPaddr_corp was active at shutdown. You may ignore this error if it is > > unmanaged. > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: > Resource > > IPaddr_mgmt was active at shutdown. You may ignore this error if it is > > unmanaged. > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: > Resource > > IPaddr_log was active at shutdown. You may ignore this error if it is > > unmanaged. > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: > Resource > > Master-Database was active at shutdown. You may ignore this error if it > is > > unmanaged. > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: > Resource > > Event-Correlation was active at shutdown. You may ignore this error if > it > > is unmanaged. > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: > Resource > > Check-Drives was active at shutdown. You may ignore this error if it is > > unmanaged. > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: verify_stopped: > Resource > > Admin-Notify was active at shutdown. You may ignore this error if it is > > unmanaged. > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: do_exit: Performing > > A_EXIT_1 - forcefully exiting the CRMd > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: ERROR: do_exit: Could not > > recover from internal error > > Mar 14 19:41:47 S-FL2-SLS-NAC crmd: [2496]: info: do_exit: [crmd] > stopped > > (2) > > Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Emergency > Shutdown: > > Master Control process died. > > Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2411 > > with SIGTERM > > Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2443 > > with SIGTERM > > Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2444 > > with SIGTERM > > Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2445 > > with SIGTERM > > Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2446 > > with SIGTERM > > Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2447 > > with SIGTERM > > Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2448 > > with SIGTERM > > Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2449 > > with SIGTERM > > Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Killing pid 2450 > > with SIGTERM > > Mar 14 19:41:48 S-FL2-SLS-NAC heartbeat: [2442]: CRIT: Emergency > > Shutdown(MCP dead): Killing ourselves. > > Mar 14 23:02:12 S-FL2-SLS-NAC auditd[2292]: Audit daemon rotating log > files > > Mar 15 07:22:19 S-FL2-SLS-NAC auditd[2292]: Audit daemon rotating log > files > > _______________________________________________ > > Linux-HA mailing list > > [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
