Hi , I'm facing a pb at first start on my two-nodes ha cluster: server2 and server3 (both with fedora11) The start on server2 seems to work fine, and on hb_gui I can see server2 OnLine and server3 Offline But the start on server3 seems to not connect to the cluster, and at the end crm_mon displays only server3, no mention of server2 (in the log, seems that it fails to calculate server2 uuid)... A ping between both heartbeat interfaces works fine ... so I have no idea why it does not work ... Any idea ? (Log is below) Thanks for help. Alain
cib[3137]: 2009/08/12_13:14:17 info: register_heartbeat_conn: Hostname: server3 cib[3137]: 2009/08/12_13:14:17 info: register_heartbeat_conn: UUID: 7c6eb9f6-c938-4012-8ba0-4d5c21dd1315 cib[3137]: 2009/08/12_13:14:17 info: ccm_connect: Registering with CCM... cib[3137]: 2009/08/12_13:14:17 WARN: ccm_connect: CCM Activation failed cib[3137]: 2009/08/12_13:14:17 WARN: ccm_connect: CCM Connection failed 1 times (30 max) stonithd[3139]: 2009/08/12_13:14:17 notice: /usr/lib64/heartbeat/stonithd start up successfully. stonithd[3139]: 2009/08/12_13:14:17 info: G_main_add_SignalHandler: Added signal handler for signal 17 mgmtd[3135]: 2009/08/12_13:14:18 info: init_crm: live mgmtd[3135]: 2009/08/12_13:14:18 info: login to cib live: 0, ret:-10 crmd[3142]: 2009/08/12_13:14:18 info: do_cib_control: Could not connect to the CIB service: connection failed crmd[3142]: 2009/08/12_13:14:18 WARN: do_cib_control: Couldn't complete CIB registration 1 times... pause and retry crmd[3142]: 2009/08/12_13:14:18 info: crmd_init: Starting crmd's mainloop mgmtd[3135]: 2009/08/12_13:14:19 info: login to cib live: 1, ret:-10 mgmtd[3135]: 2009/08/12_13:14:20 info: login to cib live: 2, ret:-10 crmd[3142]: 2009/08/12_13:14:20 info: crm_timer_popped: Wait Timer (I_NULL) just popped! cib[3137]: 2009/08/12_13:14:20 info: ccm_connect: Registering with CCM... cib[3137]: 2009/08/12_13:14:20 WARN: ccm_connect: CCM Activation failed cib[3137]: 2009/08/12_13:14:20 WARN: ccm_connect: CCM Connection failed 2 times (30 max) mgmtd[3135]: 2009/08/12_13:14:21 info: login to cib live: 3, ret:-10 crmd[3142]: 2009/08/12_13:14:21 info: do_cib_control: Could not connect to the CIB service: connection failed crmd[3142]: 2009/08/12_13:14:21 WARN: do_cib_control: Couldn't complete CIB registration 2 times... pause and retry ccm[3136]: 2009/08/12_13:14:21 debug: quorum plugin: majority ccm[3136]: 2009/08/12_13:14:21 debug: cluster:linux-ha, member_count=1, member_quorum_votes=100 ccm[3136]: 2009/08/12_13:14:21 debug: total_node_count=2, total_quorum_votes=200 ccm[3136]: 2009/08/12_13:14:21 debug: quorum plugin: twonodes ccm[3136]: 2009/08/12_13:14:21 debug: cluster:linux-ha, member_count=1, member_quorum_votes=100 ccm[3136]: 2009/08/12_13:14:21 debug: total_node_count=2, total_quorum_votes=200 ccm[3136]: 2009/08/12_13:14:21 info: Break tie for 2 nodes cluster ccm[3136]: 2009/08/12_13:14:21 info: G_main_add_SignalHandler: Added signal handler for signal 15 mgmtd[3135]: 2009/08/12_13:14:22 info: login to cib live: 4, ret:-10 mgmtd[3135]: 2009/08/12_13:14:23 info: login to cib failed: live mgmtd[3135]: 2009/08/12_13:14:23 ERROR: Can't initialize management library.Shutting down.(-1) heartbeat[3127]: 2009/08/12_13:14:23 WARN: Managed /usr/lib64/heartbeat/mgmtd -v -t process 3135 exited with return code 1. heartbeat[3127]: 2009/08/12_13:14:23 ERROR: Respawning client "/usr/lib64/heartbeat/mgmtd -v -t": heartbeat[3127]: 2009/08/12_13:14:23 info: Starting child client "/usr/lib64/heartbeat/mgmtd -v -t" (0,0) crmd[3142]: 2009/08/12_13:14:23 info: crm_timer_popped: Wait Timer (I_NULL) just popped! cib[3137]: 2009/08/12_13:14:23 info: ccm_connect: Registering with CCM... cib[3137]: 2009/08/12_13:14:23 info: cib_init: Requesting the list of configured nodes cib[3137]: 2009/08/12_13:14:23 info: cib_init: Starting cib mainloop cib[3137]: 2009/08/12_13:14:23 info: cib_client_status_callback: Status update: Client server3/cib now has status [join] cib[3137]: 2009/08/12_13:14:23 info: crm_new_peer: Node 0 is now known as server3 cib[3137]: 2009/08/12_13:14:23 info: crm_update_peer_proc: server3.cib is now online cib[3137]: 2009/08/12_13:14:23 info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm cib[3137]: 2009/08/12_13:14:23 info: mem_handle_event: instance=1, nodes=1, new=1, lost=0, n_idx=0, new_idx=0, old_idx=3 cib[3137]: 2009/08/12_13:14:23 info: cib_ccm_msg_callback: Processing CCM event=NEW MEMBERSHIP (id=1) cib[3137]: 2009/08/12_13:14:23 info: crm_get_peer: Node server3 now has id: 1 cib[3137]: 2009/08/12_13:14:23 info: crm_update_peer: Node server3: id=1 state=member (new) addr=(null) votes=-1 born=1 seen=1 proc=00000000000000000000000000000100 cib[3137]: 2009/08/12_13:14:23 info: crm_update_peer_proc: server3.ais is now online cib[3137]: 2009/08/12_13:14:23 info: crm_update_peer_proc: server3.crmd is now online cib[3144]: 2009/08/12_13:14:23 info: write_cib_contents: Archived previous version as /var/lib/heartbeat/crm/cib-9.raw cib[3144]: 2009/08/12_13:14:23 info: write_cib_contents: Wrote version 0.7.0 of the CIB to disk (digest: c41e9acb00ec2c131a3375b2b6b0faec) cib[3144]: 2009/08/12_13:14:23 info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.Mx3kVH (digest: /var/lib/heartbeat/crm/cib.hewZvu) cib[3137]: 2009/08/12_13:14:24 info: cib_client_status_callback: Status update: Client server3/cib now has status [online] heartbeat[3143]: 2009/08/12_13:14:24 info: Starting "/usr/lib64/heartbeat/mgmtd -v -t" as uid 0 gid 0 (pid 3143) mgmtd[3143]: 2009/08/12_13:14:24 info: Pacemaker-mgmt Hg Version: 1f7d1be39d34c38d64d971ac79a21339917fba4a mgmtd[3143]: 2009/08/12_13:14:24 info: G_main_add_SignalHandler: Added signal handler for signal 15 mgmtd[3143]: 2009/08/12_13:14:24 debug: Enabling coredumps mgmtd[3143]: 2009/08/12_13:14:24 info: G_main_add_SignalHandler: Added signal handler for signal 10 mgmtd[3143]: 2009/08/12_13:14:24 info: G_main_add_SignalHandler: Added signal handler for signal 12 mgmtd[3143]: 2009/08/12_13:14:24 info: init_crm: live attrd[3141]: 2009/08/12_13:14:24 info: cib_connect: Connected to the CIB after 7 signon attempts attrd[3141]: 2009/08/12_13:14:24 info: cib_connect: Sending full refresh crmd[3142]: 2009/08/12_13:14:24 info: do_cib_control: CIB connection established crmd[3142]: 2009/08/12_13:14:24 info: crm_cluster_connect: Connecting to Heartbeat crmd[3142]: 2009/08/12_13:14:24 info: register_heartbeat_conn: Hostname: server3 crmd[3142]: 2009/08/12_13:14:24 info: register_heartbeat_conn: UUID: 7c6eb9f6-c938-4012-8ba0-4d5c21dd1315 mgmtd[3143]: 2009/08/12_13:14:24 debug: main: run the loop... mgmtd[3143]: 2009/08/12_13:14:24 info: Started. crmd[3142]: 2009/08/12_13:14:25 info: do_ha_control: Connected to the cluster crmd[3142]: 2009/08/12_13:14:25 info: do_ccm_control: CCM connection established... waiting for first callback crmd[3142]: 2009/08/12_13:14:25 info: do_started: Delaying start, CCM (0000000000100000) not connected crmd[3142]: 2009/08/12_13:14:25 info: config_query_callback: Checking for expired actions every 900000ms crmd[3142]: 2009/08/12_13:14:25 notice: crmd_client_status_callback: Status update: Client server3/crmd now has status [online] (DC=false) crmd[3142]: 2009/08/12_13:14:25 info: crm_new_peer: Node 0 is now known as server3 crmd[3142]: 2009/08/12_13:14:25 info: crm_update_peer_proc: server3.crmd is now online crmd[3142]: 2009/08/12_13:14:25 info: crmd_client_status_callback: Not the DC crmd[3142]: 2009/08/12_13:14:25 notice: crmd_client_status_callback: Status update: Client server3/crmd now has status [online] (DC=false) crmd[3142]: 2009/08/12_13:14:25 info: crmd_client_status_callback: Not the DC crmd[3142]: 2009/08/12_13:14:25 info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm crmd[3142]: 2009/08/12_13:14:25 info: mem_handle_event: instance=1, nodes=1, new=1, lost=0, n_idx=0, new_idx=0, old_idx=3 crmd[3142]: 2009/08/12_13:14:25 info: crmd_ccm_msg_callback: Quorum (re)attained after event=NEW MEMBERSHIP (id=1) crmd[3142]: 2009/08/12_13:14:25 info: ccm_event_detail: NEW MEMBERSHIP: trans=1, nodes=1, new=1, lost=0 n_idx=0, new_idx=0, old_idx=3 crmd[3142]: 2009/08/12_13:14:25 info: ccm_event_detail: CURRENT: server3 [nodeid=1, born=1] crmd[3142]: 2009/08/12_13:14:25 info: ccm_event_detail: NEW: server3 [nodeid=1, born=1] crmd[3142]: 2009/08/12_13:14:25 info: crm_get_peer: Node server3 now has id: 1 crmd[3142]: 2009/08/12_13:14:25 info: crm_update_peer: Node server3: id=1 state=member (new) addr=(null) votes=-1 born=1 seen=1 proc=00000000000000000000000000000200 crmd[3142]: 2009/08/12_13:14:25 info: crm_update_peer_proc: server3.ais is now online crmd[3142]: 2009/08/12_13:14:25 info: do_started: The local CRM is operational crmd[3142]: 2009/08/12_13:14:25 info: do_state_transition: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ] crmd[3142]: 2009/08/12_13:14:36 info: crm_timer_popped: Election Trigger (I_DC_TIMEOUT) just popped! crmd[3142]: 2009/08/12_13:14:36 WARN: do_log: FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING crmd[3142]: 2009/08/12_13:14:36 info: do_state_transition: State transition S_PENDING -> S_ELECTION [ input=I_DC_TIMEOUT cause=C_TIMER_POPPED origin=crm_timer_popped crmd[3142]: 2009/08/12_13:14:36 info: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ] crmd[3142]: 2009/08/12_13:14:36 info: do_te_control: Registering TE UUID: ecf08a7c-6fb8-43e0-a8b6-011ab83ad5f7 crmd[3142]: 2009/08/12_13:14:40 info: update_dc: Set DC to server3 (3.0.1) crmd[3142]: 2009/08/12_13:14:41 info: do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ] crmd[3142]: 2009/08/12_13:14:41 info: do_state_transition: All 1 cluster nodes responded to the join offer. crmd[3142]: 2009/08/12_13:14:41 info: do_dc_join_finalize: join-1: Syncing the CIB from server3 to the rest of the cluster cib[3137]: 2009/08/12_13:14:41 info: cib_process_request: Operation complete: op cib_sync for section 'all' (origin=local/crmd/14, version=0.7.0): ok (rc=0) cib[3137]: 2009/08/12_13:14:41 info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/15, version=0.7.0): ok (rc=0) crmd[3142]: 2009/08/12_13:14:41 info: update_attrd: Connecting to attrd... crmd[3142]: 2009/08/12_13:14:41 info: attrd_update: Sent update: terminate=(null) for server3 crmd[3142]: 2009/08/12_13:14:41 info: attrd_update: Sent update: shutdown=(null) for server3 attrd[3141]: 2009/08/12_13:14:41 info: find_hash_entry: Creating hash entry for terminate attrd[3141]: 2009/08/12_13:14:41 info: find_hash_entry: Creating hash entry for shutdown crmd[3142]: 2009/08/12_13:14:41 info: do_dc_join_ack: join-1: Updating node state to member for server3 cib[3137]: 2009/08/12_13:14:41 info: cib_process_request: Operation complete: op cib_delete for section //node_sta...@uname='server3']/transient_attributes (origin=local/crmd/16, version=0.7.0): ok (rc=0) crmd[3142]: 2009/08/12_13:14:41 info: erase_xpath_callback: Deletion of "//node_sta...@uname='server3']/transient_attributes": ok (rc=0) cib[3137]: 2009/08/12_13:14:41 info: cib_process_request: Operation complete: op cib_delete for section //node_sta...@uname='server3']/lrm (origin=local/crmd/17, version=0.7.0): ok (rc=0) crmd[3142]: 2009/08/12_13:14:41 info: erase_xpath_callback: Deletion of "//node_sta...@uname='server3']/lrm": ok (rc=0) cib[3137]: 2009/08/12_13:14:41 info: cib_process_request: Operation complete: op cib_delete for section //node_sta...@uname='server3']/lrm (origin=local/crmd/18, version=0.7.0): ok (rc=0) crmd[3142]: 2009/08/12_13:14:41 info: erase_xpath_callback: Deletion of "//node_sta...@uname='server3']/lrm": ok (rc=0) crmd[3142]: 2009/08/12_13:14:41 info: do_state_transition: State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED cause=C_FSA_INTERNAL origin=check_join_state ] crmd[3142]: 2009/08/12_13:14:41 info: populate_cib_nodes_ha: Requesting the list of configured nodes crmd[3142]: 2009/08/12_13:14:43 WARN: get_uuid: Could not calculate UUID for server2 crmd[3142]: 2009/08/12_13:14:43 WARN: populate_cib_nodes_ha: Node server2: no uuid found crmd[3142]: 2009/08/12_13:14:43 info: do_state_transition: All 1 cluster nodes are eligible to run resources. crmd[3142]: 2009/08/12_13:14:43 info: do_dc_join_final: Ensuring DC, quorum and node attributes are up-to-date crmd[3142]: 2009/08/12_13:14:43 info: attrd_update: Sent update: (null)=(null) for localhost attrd[3141]: 2009/08/12_13:14:43 info: attrd_local_callback: Sending full refresh (origin=crmd) crmd[3142]: 2009/08/12_13:14:43 info: crm_update_quorum: Updating quorum status to true (call=22) attrd[3141]: 2009/08/12_13:14:43 info: attrd_trigger_update: Sending flush op to all hosts for: shutdown (<null>) crmd[3142]: 2009/08/12_13:14:43 info: abort_transition_graph: do_te_invoke:190 - Triggered transition abort (complete=1) : Peer Cancelled crmd[3142]: 2009/08/12_13:14:43 info: do_pe_invoke: Query 23: Requesting the current CIB: S_POLICY_ENGINE attrd[3141]: 2009/08/12_13:14:43 info: attrd_trigger_update: Sending flush op to all hosts for: terminate (<null>) cib[3137]: 2009/08/12_13:14:43 info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/20, version=0.7.1): ok (rc=0) crmd[3142]: 2009/08/12_13:14:43 info: abort_transition_graph: need_abort:59 - Triggered transition abort (complete=1) : Non-status change crmd[3142]: 2009/08/12_13:14:43 info: need_abort: Aborting on change to admin_epoch crmd[3142]: 2009/08/12_13:14:43 info: do_pe_invoke: Query 24: Requesting the current CIB: S_POLICY_ENGINE cib[3137]: 2009/08/12_13:14:43 info: log_data_element: cib:diff: - <cib admin_epoch="0" epoch="7" num_updates="1" /> cib[3137]: 2009/08/12_13:14:43 info: log_data_element: cib:diff: + <cib dc-uuid="7c6eb9f6-c938-4012-8ba0-4d5c21dd1315" admin_epoch="0" epoch="8" num_updates="1" /> cib[3137]: 2009/08/12_13:14:43 info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/22, version=0.8.1): ok (rc=0) crmd[3142]: 2009/08/12_13:14:43 info: do_pe_invoke_callback: Invoking the PE: ref=pe_calc-dc-1250075683-7, seq=1, quorate=1 pengine[3145]: 2009/08/12_13:14:43 info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 pengine[3145]: 2009/08/12_13:14:43 ERROR: unpack_resources: No STONITH resources have been defined pengine[3145]: 2009/08/12_13:14:43 ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option pengine[3145]: 2009/08/12_13:14:43 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity pengine[3145]: 2009/08/12_13:14:43 info: determine_online_status: Node server3 is online pengine[3145]: 2009/08/12_13:14:43 info: stage6: Delaying fencing operations until there are resources to manage cib[3146]: 2009/08/12_13:14:43 info: write_cib_contents: Archived previous version as /var/lib/heartbeat/crm/cib-10.raw cib[3146]: 2009/08/12_13:14:43 info: write_cib_contents: Wrote version 0.8.0 of the CIB to disk (digest: a037845ebd38e720357a1dbf3e1faf68) crmd[3142]: 2009/08/12_13:14:36 WARN: cib_client_add_notify_callback: Callback already present crmd[3142]: 2009/08/12_13:14:36 info: set_graph_functions: Setting custom graph functions crmd[3142]: 2009/08/12_13:14:36 info: unpack_graph: Unpacked transition -1: 0 actions in 0 synapses crmd[3142]: 2009/08/12_13:14:36 info: start_subsystem: Starting sub-system "pengine" pengine[3145]: 2009/08/12_13:14:36 info: main: Starting pengine crmd[3142]: 2009/08/12_13:14:39 info: do_dc_takeover: Taking over DC status for this partition cib[3137]: 2009/08/12_13:14:39 info: cib_process_readwrite: We are now in R/W mode cib[3137]: 2009/08/12_13:14:39 info: cib_process_request: Operation complete: op cib_master for section 'all' (origin=local/crmd/6, version=0.7.0): ok (rc=0) cib[3137]: 2009/08/12_13:14:39 info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/7, version=0.7.0): ok (rc=0) cib[3137]: 2009/08/12_13:14:39 info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/9, version=0.7.0): ok (rc=0) crmd[3142]: 2009/08/12_13:14:39 info: join_make_offer: Making join offers based on membership 1 crmd[3142]: 2009/08/12_13:14:39 info: do_dc_join_offer_all: join-1: Waiting on 1 outstanding join acks crmd[3142]: 2009/08/12_13:14:39 info: te_connect_stonith: Attempting connection to fencing daemon... cib[3137]: 2009/08/12_13:14:39 info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/11, version=0.7.0): ok (rc=0) crmd[3142]: 2009/08/12_13:14:40 info: te_connect_stonith: Connected crmd[3142]: 2009/08/12_13:14:40 info: config_query_callback: Checking for expired actions every 900000ms _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
