[Linux-HA] Problem about hb_addnode and hb_delnode

?? Mon, 31 Mar 2008 23:49:55 -0700

Hello,
Today, I create three virtual machine (Guest OS = redhat enterprise 4u4) with 
vmware workstation 6 on winxp. Then, I install heartbeat-2.1.3 on each vm by 
src packet. each guest os hava two ethernet card(eth0 and eth1), which was used 
as heartbeat medium.  "crm on" was added to the endpoint of ha.cf, and 
"autojoin" didn't appear in ha.cf.  
Once heartbeat configured success, I start it at all three nodes(named as 
"node2" "master" and "slave").
Heartbeat on each node is OK.
What i want to do is to test hb_addnode and hb_delnode.
Test steps will be described  below: 
*STEP 1 First, i execute command "hb_delnode master" on node "slave". Then "" 
appeared in logfile.
*STEP 2 So, i have to stopped the heartbeat software running on master 
(/etc/init.d/heartbeat stop), then hb_delnode succeed on node slave.
It can be confirmed by hb_gui.
*STEP 3 Let node "master" to rejoin the cluster. When I run "hb_delnode master" 
on node "slave", I failed. and node "node2" rebooted !!!
 I can't find enough document about dynamic-node-management.  Does anybody know 
how to use them ?   Thanks!



Log at "node2" likes below:
//////////////////////////////////////////////////////////////////////////////////////////
heartbeat[4767]: 2008/03/31_19:08:17 info: hb_add_one_node: Adding new 
node[master] to configuration.
ccm[4806]: 2008/03/31_19:08:38 ERROR: ccm_control_process: Node count from node 
slave does not agree: local count=2, count in message=3
ccm[4806]: 2008/03/31_19:08:38 ERROR: Please make sure ha.cf files on all nodes 
have same nodes list or add "autojoin any" to ha.cf
ccm[4806]: 2008/03/31_19:08:38 info: If this problem persists, check the 
heartbeat 'hostcache' files in the cluster to look for problems.
cib[4807]: 2008/03/31_19:08:38 info: mem_handle_func:IPC broken, ccm is dead 
before the client!
cib[4807]: 2008/03/31_19:08:38 ERROR: cib_ccm_dispatch: CCM connection appears 
to have failed: rc=-1.
cib[4807]: 2008/03/31_19:08:38 ERROR: cib_ccm_dispatch: Exiting to recover from 
CCM connection failure
crmd[4811]: 2008/03/31_19:08:38 info: mem_handle_func:IPC broken, ccm is dead 
before the client!
crmd[4811]: 2008/03/31_19:08:38 ERROR: ccm_dispatch: CCM connection appears to 
have failed: rc=-1.
crmd[4811]: 2008/03/31_19:08:38 ERROR: do_log: [[FSA]] Input I_ERROR from 
ccm_dispatch() received in state (S_IDLE)
crmd[4811]: 2008/03/31_19:08:38 info: do_state_transition: State transition 
S_IDLE -> S_RECOVERY [ input=I_ERROR cause=C_CCM_CALLBACK origin=ccm_dispatch ]
crmd[4811]: 2008/03/31_19:08:38 ERROR: do_recover: Action A_RECOVER 
(0000000001000000) not supported
crmd[4811]: 2008/03/31_19:08:38 WARN: do_election_vote: Not voting in election, 
we're in state S_RECOVERY
crmd[4811]: 2008/03/31_19:08:38 info: do_dc_release: DC role released
mgmtd[4812]: 2008/03/31_19:08:38 CRIT: cib_native_dispatch: Lost connection to 
the CIB service [4807/callback].
tengine[4841]: 2008/03/31_19:08:38 ERROR: cib_native_msgready: Message pending 
on command channel [4807]
heartbeat[4767]: 2008/03/31_19:08:38 WARN: Managed /opt/ha/lib/heartbeat/ccm 
process 4806 exited with return code 1.
attrd[4810]: 2008/03/31_19:08:38 ERROR: cib_native_msgready: Message pending on 
command channel [4807]
crmd[4811]: 2008/03/31_19:08:38 info: stop_subsystem: Sent -TERM to pengine: 
[4842]
pengine[4842]: 2008/03/31_19:08:38 info: pengine_shutdown: Exiting PEngine 
(SIGTERM)
heartbeat[4767]: 2008/03/31_19:08:38 EMERG: Rebooting system.  Reason: 
/opt/ha/lib/heartbeat/ccm
attrd[4810]: 2008/03/31_19:08:38 ERROR: crm_log_message_adv: #========= cib:cmd 
message start ==========#
tengine[4841]: 2008/03/31_19:08:38 ERROR: crm_log_message_adv: #========= 
cib:cmd message start ==========#
crmd[4811]: 2008/03/31_19:08:38 info: stop_subsystem: Sent -TERM to tengine: 
[4841]
attrd[4810]: 2008/03/31_19:08:38 ERROR: MSG: No message to dump
tengine[4841]: 2008/03/31_19:08:38 ERROR: MSG: No message to dump
crmd[4811]: 2008/03/31_19:08:38 ERROR: do_log: [[FSA]] Input I_TERMINATE from 
do_recover() received in state (S_RECOVERY)
attrd[4810]: 2008/03/31_19:08:38 info: cib_native_msgready: Lost connection to 
the CIB service [4807].
tengine[4841]: 2008/03/31_19:08:38 info: cib_native_msgready: Lost connection 
to the CIB service [4807].
crmd[4811]: 2008/03/31_19:08:38 info: do_state_transition: State transition 
S_RECOVERY -> S_TERMINATE [ input=I_TERMINATE cause=C_FSA_INTERNAL 
origin=do_recover ]
attrd[4810]: 2008/03/31_19:08:38 CRIT: cib_native_dispatch: Lost connection to 
the CIB service [4807/callback].
tengine[4841]: 2008/03/31_19:08:38 CRIT: cib_native_dispatch: Lost connection 
to the CIB service [4807/callback].
crmd[4811]: 2008/03/31_19:08:38 info: do_shutdown: Terminating the pengine
attrd[4810]: 2008/03/31_19:08:38 ERROR: attrd_cib_connection_destroy: 
Connection to the CIB terminated...
tengine[4841]: 2008/03/31_19:08:39 info: update_abort_priority: Abort priority 
upgraded to 1000000
crmd[4811]: 2008/03/31_19:08:39 info: stop_subsystem: Sent -TERM to pengine: 
[4842]
tengine[4841]: 2008/03/31_19:08:39 info: update_abort_priority: Abort action 2 
superceeded by 3
crmd[4811]: 2008/03/31_19:08:39 info: do_shutdown: Terminating the tengine
crmd[4811]: 2008/03/31_19:08:39 info: stop_subsystem: Sent -TERM to tengine: 
[4841]
tengine[4841]: 2008/03/31_19:08:39 info: notify_crmd: Exiting after transition
crmd[4811]: 2008/03/31_19:08:39 info: do_shutdown: Waiting for subsystems to 
exit
tengine[4841]: 2008/03/31_19:08:39 info: te_init: Exiting tengine
crmd[4811]: 2008/03/31_19:08:39 WARN: register_fsa_input_adv: do_shutdown 
stalled the FSA with pending inputs
crmd[4811]: 2008/03/31_19:08:39 info: do_shutdown: All subsystems stopped, 
continuing
crmd[4811]: 2008/03/31_19:08:39 WARN: do_log: [[FSA]] Input I_PENDING from 
do_election_vote() received in state (S_TERMINATE)
crmd[4811]: 2008/03/31_19:08:39 info: do_shutdown: Terminating the pengine
crmd[4811]: 2008/03/31_19:08:39 info: stop_subsystem: Sent -TERM to pengine: 
[4842]
crmd[4811]: 2008/03/31_19:08:39 info: do_shutdown: Terminating the tengine
crmd[4811]: 2008/03/31_19:08:39 info: stop_subsystem: Sent -TERM to tengine: 
[4841]
crmd[4811]: 2008/03/31_19:08:39 info: do_shutdown: Waiting for subsystems to 
exit
crmd[4811]: 2008/03/31_19:08:39 WARN: register_fsa_input_adv: do_shutdown 
stalled the FSA with pending inputs
crmd[4811]: 2008/03/31_19:08:39 info: do_shutdown: All subsystems stopped, 
continuing
crmd[4811]: 2008/03/31_19:08:39 WARN: G_SIG_dispatch: Dispatch function for 
SIGCHLD was delayed 1260 ms (> 100 ms) before being called (GSource: 0x8e95b10)
crmd[4811]: 2008/03/31_19:08:39 info: G_SIG_dispatch: started at 430009289 
should have started at 430009163
crmd[4811]: 2008/03/31_19:08:39 info: crmdManagedChildDied: Process 
tengine:[4841] exited (signal=0, exitcode=0)
crmd[4811]: 2008/03/31_19:08:39 info: crmdManagedChildDied: Process 
pengine:[4842] exited (signal=0, exitcode=0)
crmd[4811]: 2008/03/31_19:08:39 ERROR: cib_native_msgready: Message pending on 
command channel [4807]
crmd[4811]: 2008/03/31_19:08:39 ERROR: crm_log_message_adv: #========= cib:cmd 
message start ==========#
crmd[4811]: 2008/03/31_19:08:39 ERROR: MSG: No message to dump
crmd[4811]: 2008/03/31_19:08:39 info: cib_native_msgready: Lost connection to 
the CIB service [4807].
crmd[4811]: 2008/03/31_19:08:39 CRIT: cib_native_dispatch: Lost connection to 
the CIB service [4807/callback].
crmd[4811]: 2008/03/31_19:08:39 ERROR: crmd_cib_connection_destroy: Connection 
to the CIB terminated...
crmd[4811]: 2008/03/31_19:08:39 WARN: do_log: [[FSA]] Input I_RELEASE_SUCCESS 
from do_dc_release() received in state (S_TERMINATE)
crmd[4811]: 2008/03/31_19:08:39 info: do_shutdown: All subsystems stopped, 
continuing
crmd[4811]: 2008/03/31_19:08:39 info: do_lrm_control: Disconnected from the LRM
crmd[4811]: 2008/03/31_19:08:39 info: do_ha_control: Disconnected from Heartbeat
crmd[4811]: 2008/03/31_19:08:39 info: do_cib_control: Disconnecting CIB
crmd[4811]: 2008/03/31_19:08:39 ERROR: send_ipc_message: IPC Channel to 4807 is 
not connected
crmd[4811]: 2008/03/31_19:08:39 WARN: crm_log_message_adv: #========= 
IPC[outbound] message start ==========#
crmd[4811]: 2008/03/31_19:08:39 WARN: MSG: Dumping message with 5 fields
crmd[4811]: 2008/03/31_19:08:39 WARN: MSG[0] : [__name__=cib_command]
crmd[4811]: 2008/03/31_19:08:39 WARN: MSG[1] : [t=cib]
/////////////////////////////////////////////////////////////////////////////////////////



_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Problem about hb_addnode and hb_delnode

Reply via email to