Hello, Today, I create three virtual machine (Guest OS = redhat enterprise 4u4) with vmware workstation 6 on winxp. Then, I install heartbeat-2.1.3 on each vm by src packet. each guest os hava two ethernet card(eth0 and eth1), which was used as heartbeat medium. "crm on" was added to the endpoint of ha.cf, and "autojoin" didn't appear in ha.cf. Once heartbeat configured success, I start it at all three nodes(named as "node2" "master" and "slave"). Heartbeat on each node is OK. What i want to do is to test hb_addnode and hb_delnode. Test steps will be described below: *STEP 1 First, i execute command "hb_delnode master" on node "slave". Then "" appeared in logfile. *STEP 2 So, i have to stopped the heartbeat software running on master (/etc/init.d/heartbeat stop), then hb_delnode succeed on node slave. It can be confirmed by hb_gui. *STEP 3 Let node "master" to rejoin the cluster. When I run "hb_delnode master" on node "slave", I failed. and node "node2" rebooted !!! I can't find enough document about dynamic-node-management. Does anybody know how to use them ? Thanks!
Log at "node2" likes below: ////////////////////////////////////////////////////////////////////////////////////////// heartbeat[4767]: 2008/03/31_19:08:17 info: hb_add_one_node: Adding new node[master] to configuration. ccm[4806]: 2008/03/31_19:08:38 ERROR: ccm_control_process: Node count from node slave does not agree: local count=2, count in message=3 ccm[4806]: 2008/03/31_19:08:38 ERROR: Please make sure ha.cf files on all nodes have same nodes list or add "autojoin any" to ha.cf ccm[4806]: 2008/03/31_19:08:38 info: If this problem persists, check the heartbeat 'hostcache' files in the cluster to look for problems. cib[4807]: 2008/03/31_19:08:38 info: mem_handle_func:IPC broken, ccm is dead before the client! cib[4807]: 2008/03/31_19:08:38 ERROR: cib_ccm_dispatch: CCM connection appears to have failed: rc=-1. cib[4807]: 2008/03/31_19:08:38 ERROR: cib_ccm_dispatch: Exiting to recover from CCM connection failure crmd[4811]: 2008/03/31_19:08:38 info: mem_handle_func:IPC broken, ccm is dead before the client! crmd[4811]: 2008/03/31_19:08:38 ERROR: ccm_dispatch: CCM connection appears to have failed: rc=-1. crmd[4811]: 2008/03/31_19:08:38 ERROR: do_log: [[FSA]] Input I_ERROR from ccm_dispatch() received in state (S_IDLE) crmd[4811]: 2008/03/31_19:08:38 info: do_state_transition: State transition S_IDLE -> S_RECOVERY [ input=I_ERROR cause=C_CCM_CALLBACK origin=ccm_dispatch ] crmd[4811]: 2008/03/31_19:08:38 ERROR: do_recover: Action A_RECOVER (0000000001000000) not supported crmd[4811]: 2008/03/31_19:08:38 WARN: do_election_vote: Not voting in election, we're in state S_RECOVERY crmd[4811]: 2008/03/31_19:08:38 info: do_dc_release: DC role released mgmtd[4812]: 2008/03/31_19:08:38 CRIT: cib_native_dispatch: Lost connection to the CIB service [4807/callback]. tengine[4841]: 2008/03/31_19:08:38 ERROR: cib_native_msgready: Message pending on command channel [4807] heartbeat[4767]: 2008/03/31_19:08:38 WARN: Managed /opt/ha/lib/heartbeat/ccm process 4806 exited with return code 1. attrd[4810]: 2008/03/31_19:08:38 ERROR: cib_native_msgready: Message pending on command channel [4807] crmd[4811]: 2008/03/31_19:08:38 info: stop_subsystem: Sent -TERM to pengine: [4842] pengine[4842]: 2008/03/31_19:08:38 info: pengine_shutdown: Exiting PEngine (SIGTERM) heartbeat[4767]: 2008/03/31_19:08:38 EMERG: Rebooting system. Reason: /opt/ha/lib/heartbeat/ccm attrd[4810]: 2008/03/31_19:08:38 ERROR: crm_log_message_adv: #========= cib:cmd message start ==========# tengine[4841]: 2008/03/31_19:08:38 ERROR: crm_log_message_adv: #========= cib:cmd message start ==========# crmd[4811]: 2008/03/31_19:08:38 info: stop_subsystem: Sent -TERM to tengine: [4841] attrd[4810]: 2008/03/31_19:08:38 ERROR: MSG: No message to dump tengine[4841]: 2008/03/31_19:08:38 ERROR: MSG: No message to dump crmd[4811]: 2008/03/31_19:08:38 ERROR: do_log: [[FSA]] Input I_TERMINATE from do_recover() received in state (S_RECOVERY) attrd[4810]: 2008/03/31_19:08:38 info: cib_native_msgready: Lost connection to the CIB service [4807]. tengine[4841]: 2008/03/31_19:08:38 info: cib_native_msgready: Lost connection to the CIB service [4807]. crmd[4811]: 2008/03/31_19:08:38 info: do_state_transition: State transition S_RECOVERY -> S_TERMINATE [ input=I_TERMINATE cause=C_FSA_INTERNAL origin=do_recover ] attrd[4810]: 2008/03/31_19:08:38 CRIT: cib_native_dispatch: Lost connection to the CIB service [4807/callback]. tengine[4841]: 2008/03/31_19:08:38 CRIT: cib_native_dispatch: Lost connection to the CIB service [4807/callback]. crmd[4811]: 2008/03/31_19:08:38 info: do_shutdown: Terminating the pengine attrd[4810]: 2008/03/31_19:08:38 ERROR: attrd_cib_connection_destroy: Connection to the CIB terminated... tengine[4841]: 2008/03/31_19:08:39 info: update_abort_priority: Abort priority upgraded to 1000000 crmd[4811]: 2008/03/31_19:08:39 info: stop_subsystem: Sent -TERM to pengine: [4842] tengine[4841]: 2008/03/31_19:08:39 info: update_abort_priority: Abort action 2 superceeded by 3 crmd[4811]: 2008/03/31_19:08:39 info: do_shutdown: Terminating the tengine crmd[4811]: 2008/03/31_19:08:39 info: stop_subsystem: Sent -TERM to tengine: [4841] tengine[4841]: 2008/03/31_19:08:39 info: notify_crmd: Exiting after transition crmd[4811]: 2008/03/31_19:08:39 info: do_shutdown: Waiting for subsystems to exit tengine[4841]: 2008/03/31_19:08:39 info: te_init: Exiting tengine crmd[4811]: 2008/03/31_19:08:39 WARN: register_fsa_input_adv: do_shutdown stalled the FSA with pending inputs crmd[4811]: 2008/03/31_19:08:39 info: do_shutdown: All subsystems stopped, continuing crmd[4811]: 2008/03/31_19:08:39 WARN: do_log: [[FSA]] Input I_PENDING from do_election_vote() received in state (S_TERMINATE) crmd[4811]: 2008/03/31_19:08:39 info: do_shutdown: Terminating the pengine crmd[4811]: 2008/03/31_19:08:39 info: stop_subsystem: Sent -TERM to pengine: [4842] crmd[4811]: 2008/03/31_19:08:39 info: do_shutdown: Terminating the tengine crmd[4811]: 2008/03/31_19:08:39 info: stop_subsystem: Sent -TERM to tengine: [4841] crmd[4811]: 2008/03/31_19:08:39 info: do_shutdown: Waiting for subsystems to exit crmd[4811]: 2008/03/31_19:08:39 WARN: register_fsa_input_adv: do_shutdown stalled the FSA with pending inputs crmd[4811]: 2008/03/31_19:08:39 info: do_shutdown: All subsystems stopped, continuing crmd[4811]: 2008/03/31_19:08:39 WARN: G_SIG_dispatch: Dispatch function for SIGCHLD was delayed 1260 ms (> 100 ms) before being called (GSource: 0x8e95b10) crmd[4811]: 2008/03/31_19:08:39 info: G_SIG_dispatch: started at 430009289 should have started at 430009163 crmd[4811]: 2008/03/31_19:08:39 info: crmdManagedChildDied: Process tengine:[4841] exited (signal=0, exitcode=0) crmd[4811]: 2008/03/31_19:08:39 info: crmdManagedChildDied: Process pengine:[4842] exited (signal=0, exitcode=0) crmd[4811]: 2008/03/31_19:08:39 ERROR: cib_native_msgready: Message pending on command channel [4807] crmd[4811]: 2008/03/31_19:08:39 ERROR: crm_log_message_adv: #========= cib:cmd message start ==========# crmd[4811]: 2008/03/31_19:08:39 ERROR: MSG: No message to dump crmd[4811]: 2008/03/31_19:08:39 info: cib_native_msgready: Lost connection to the CIB service [4807]. crmd[4811]: 2008/03/31_19:08:39 CRIT: cib_native_dispatch: Lost connection to the CIB service [4807/callback]. crmd[4811]: 2008/03/31_19:08:39 ERROR: crmd_cib_connection_destroy: Connection to the CIB terminated... crmd[4811]: 2008/03/31_19:08:39 WARN: do_log: [[FSA]] Input I_RELEASE_SUCCESS from do_dc_release() received in state (S_TERMINATE) crmd[4811]: 2008/03/31_19:08:39 info: do_shutdown: All subsystems stopped, continuing crmd[4811]: 2008/03/31_19:08:39 info: do_lrm_control: Disconnected from the LRM crmd[4811]: 2008/03/31_19:08:39 info: do_ha_control: Disconnected from Heartbeat crmd[4811]: 2008/03/31_19:08:39 info: do_cib_control: Disconnecting CIB crmd[4811]: 2008/03/31_19:08:39 ERROR: send_ipc_message: IPC Channel to 4807 is not connected crmd[4811]: 2008/03/31_19:08:39 WARN: crm_log_message_adv: #========= IPC[outbound] message start ==========# crmd[4811]: 2008/03/31_19:08:39 WARN: MSG: Dumping message with 5 fields crmd[4811]: 2008/03/31_19:08:39 WARN: MSG[0] : [__name__=cib_command] crmd[4811]: 2008/03/31_19:08:39 WARN: MSG[1] : [t=cib] ///////////////////////////////////////////////////////////////////////////////////////// _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
