please use hb_report for things like this.
we'd need logs from both nodes (and trying to read line-wrapped logs
is near impossible)

On Wed, Aug 12, 2009 at 1:30 PM, Alain.Moulle<[email protected]> wrote:
> Hi ,
> I'm facing a pb at first start on my two-nodes ha cluster:  server2 and
> server3 (both with fedora11)
> The start on server2 seems to work fine, and on hb_gui I can see server2
> OnLine and
> server3 Offline
> But the start on server3 seems to not connect to the cluster, and at the
> end crm_mon displays
> only server3, no mention of server2 (in the log, seems that it fails to
> calculate server2 uuid)...
> A ping between both heartbeat interfaces works fine ... so I have no
> idea why it does not
> work ...
> Any idea ? (Log is below)
> Thanks for help.
> Alain
>
>
> cib[3137]: 2009/08/12_13:14:17 info: register_heartbeat_conn: Hostname:
> server3
> cib[3137]: 2009/08/12_13:14:17 info: register_heartbeat_conn: UUID:
> 7c6eb9f6-c938-4012-8ba0-4d5c21dd1315
> cib[3137]: 2009/08/12_13:14:17 info: ccm_connect: Registering with CCM...
> cib[3137]: 2009/08/12_13:14:17 WARN: ccm_connect: CCM Activation failed
> cib[3137]: 2009/08/12_13:14:17 WARN: ccm_connect: CCM Connection failed
> 1 times (30 max)
> stonithd[3139]: 2009/08/12_13:14:17 notice:
> /usr/lib64/heartbeat/stonithd start up successfully.
> stonithd[3139]: 2009/08/12_13:14:17 info: G_main_add_SignalHandler:
> Added signal handler for signal 17
> mgmtd[3135]: 2009/08/12_13:14:18 info: init_crm: live
> mgmtd[3135]: 2009/08/12_13:14:18 info: login to cib live: 0, ret:-10
> crmd[3142]: 2009/08/12_13:14:18 info: do_cib_control: Could not connect
> to the CIB service: connection failed
> crmd[3142]: 2009/08/12_13:14:18 WARN: do_cib_control: Couldn't complete
> CIB registration 1 times... pause and retry
> crmd[3142]: 2009/08/12_13:14:18 info: crmd_init: Starting crmd's mainloop
> mgmtd[3135]: 2009/08/12_13:14:19 info: login to cib live: 1, ret:-10
> mgmtd[3135]: 2009/08/12_13:14:20 info: login to cib live: 2, ret:-10
> crmd[3142]: 2009/08/12_13:14:20 info: crm_timer_popped: Wait Timer
> (I_NULL) just popped!
> cib[3137]: 2009/08/12_13:14:20 info: ccm_connect: Registering with CCM...
> cib[3137]: 2009/08/12_13:14:20 WARN: ccm_connect: CCM Activation failed
> cib[3137]: 2009/08/12_13:14:20 WARN: ccm_connect: CCM Connection failed
> 2 times (30 max)
> mgmtd[3135]: 2009/08/12_13:14:21 info: login to cib live: 3, ret:-10
> crmd[3142]: 2009/08/12_13:14:21 info: do_cib_control: Could not connect
> to the CIB service: connection failed
> crmd[3142]: 2009/08/12_13:14:21 WARN: do_cib_control: Couldn't complete
> CIB registration 2 times... pause and retry
> ccm[3136]: 2009/08/12_13:14:21 debug: quorum plugin: majority
> ccm[3136]: 2009/08/12_13:14:21 debug: cluster:linux-ha, member_count=1,
> member_quorum_votes=100
> ccm[3136]: 2009/08/12_13:14:21 debug: total_node_count=2,
> total_quorum_votes=200
> ccm[3136]: 2009/08/12_13:14:21 debug: quorum plugin: twonodes
> ccm[3136]: 2009/08/12_13:14:21 debug: cluster:linux-ha, member_count=1,
> member_quorum_votes=100
> ccm[3136]: 2009/08/12_13:14:21 debug: total_node_count=2,
> total_quorum_votes=200
> ccm[3136]: 2009/08/12_13:14:21 info: Break tie for 2 nodes cluster
> ccm[3136]: 2009/08/12_13:14:21 info: G_main_add_SignalHandler: Added
> signal handler for signal 15
> mgmtd[3135]: 2009/08/12_13:14:22 info: login to cib live: 4, ret:-10
> mgmtd[3135]: 2009/08/12_13:14:23 info: login to cib failed: live
> mgmtd[3135]: 2009/08/12_13:14:23 ERROR: Can't initialize management
> library.Shutting down.(-1)
> heartbeat[3127]: 2009/08/12_13:14:23 WARN: Managed
> /usr/lib64/heartbeat/mgmtd -v -t process 3135 exited with return code 1.
> heartbeat[3127]: 2009/08/12_13:14:23 ERROR: Respawning client
> "/usr/lib64/heartbeat/mgmtd -v -t":
> heartbeat[3127]: 2009/08/12_13:14:23 info: Starting child client
> "/usr/lib64/heartbeat/mgmtd -v -t" (0,0)
> crmd[3142]: 2009/08/12_13:14:23 info: crm_timer_popped: Wait Timer
> (I_NULL) just popped!
> cib[3137]: 2009/08/12_13:14:23 info: ccm_connect: Registering with CCM...
> cib[3137]: 2009/08/12_13:14:23 info: cib_init: Requesting the list of
> configured nodes
> cib[3137]: 2009/08/12_13:14:23 info: cib_init: Starting cib mainloop
> cib[3137]: 2009/08/12_13:14:23 info: cib_client_status_callback: Status
> update: Client server3/cib now has status [join]
> cib[3137]: 2009/08/12_13:14:23 info: crm_new_peer: Node 0 is now known
> as server3
> cib[3137]: 2009/08/12_13:14:23 info: crm_update_peer_proc: server3.cib
> is now online
> cib[3137]: 2009/08/12_13:14:23 info: mem_handle_event: Got an event
> OC_EV_MS_NEW_MEMBERSHIP from ccm
> cib[3137]: 2009/08/12_13:14:23 info: mem_handle_event: instance=1,
> nodes=1, new=1, lost=0, n_idx=0, new_idx=0, old_idx=3
> cib[3137]: 2009/08/12_13:14:23 info: cib_ccm_msg_callback: Processing
> CCM event=NEW MEMBERSHIP (id=1)
> cib[3137]: 2009/08/12_13:14:23 info: crm_get_peer: Node server3 now has
> id: 1
> cib[3137]: 2009/08/12_13:14:23 info: crm_update_peer: Node server3: id=1
> state=member (new) addr=(null) votes=-1 born=1 seen=1
> proc=00000000000000000000000000000100
> cib[3137]: 2009/08/12_13:14:23 info: crm_update_peer_proc: server3.ais
> is now online
> cib[3137]: 2009/08/12_13:14:23 info: crm_update_peer_proc: server3.crmd
> is now online
> cib[3144]: 2009/08/12_13:14:23 info: write_cib_contents: Archived
> previous version as /var/lib/heartbeat/crm/cib-9.raw
> cib[3144]: 2009/08/12_13:14:23 info: write_cib_contents: Wrote version
> 0.7.0 of the CIB to disk (digest: c41e9acb00ec2c131a3375b2b6b0faec)
> cib[3144]: 2009/08/12_13:14:23 info: retrieveCib: Reading cluster
> configuration from: /var/lib/heartbeat/crm/cib.Mx3kVH (digest:
> /var/lib/heartbeat/crm/cib.hewZvu)
> cib[3137]: 2009/08/12_13:14:24 info: cib_client_status_callback: Status
> update: Client server3/cib now has status [online]
> heartbeat[3143]: 2009/08/12_13:14:24 info: Starting
> "/usr/lib64/heartbeat/mgmtd -v -t" as uid 0  gid 0 (pid 3143)
> mgmtd[3143]: 2009/08/12_13:14:24 info: Pacemaker-mgmt Hg Version:
> 1f7d1be39d34c38d64d971ac79a21339917fba4a
> mgmtd[3143]: 2009/08/12_13:14:24 info: G_main_add_SignalHandler: Added
> signal handler for signal 15
> mgmtd[3143]: 2009/08/12_13:14:24 debug: Enabling coredumps
> mgmtd[3143]: 2009/08/12_13:14:24 info: G_main_add_SignalHandler: Added
> signal handler for signal 10
> mgmtd[3143]: 2009/08/12_13:14:24 info: G_main_add_SignalHandler: Added
> signal handler for signal 12
> mgmtd[3143]: 2009/08/12_13:14:24 info: init_crm: live
> attrd[3141]: 2009/08/12_13:14:24 info: cib_connect: Connected to the CIB
> after 7 signon attempts
> attrd[3141]: 2009/08/12_13:14:24 info: cib_connect: Sending full refresh
> crmd[3142]: 2009/08/12_13:14:24 info: do_cib_control: CIB connection
> established
> crmd[3142]: 2009/08/12_13:14:24 info: crm_cluster_connect: Connecting to
> Heartbeat
> crmd[3142]: 2009/08/12_13:14:24 info: register_heartbeat_conn: Hostname:
> server3
> crmd[3142]: 2009/08/12_13:14:24 info: register_heartbeat_conn: UUID:
> 7c6eb9f6-c938-4012-8ba0-4d5c21dd1315
> mgmtd[3143]: 2009/08/12_13:14:24 debug: main: run the loop...
> mgmtd[3143]: 2009/08/12_13:14:24 info: Started.
> crmd[3142]: 2009/08/12_13:14:25 info: do_ha_control: Connected to the
> cluster
> crmd[3142]: 2009/08/12_13:14:25 info: do_ccm_control: CCM connection
> established... waiting for first callback
> crmd[3142]: 2009/08/12_13:14:25 info: do_started: Delaying start, CCM
> (0000000000100000) not connected
> crmd[3142]: 2009/08/12_13:14:25 info: config_query_callback: Checking
> for expired actions every 900000ms
> crmd[3142]: 2009/08/12_13:14:25 notice: crmd_client_status_callback:
> Status update: Client server3/crmd now has status [online] (DC=false)
> crmd[3142]: 2009/08/12_13:14:25 info: crm_new_peer: Node 0 is now known
> as server3
> crmd[3142]: 2009/08/12_13:14:25 info: crm_update_peer_proc: server3.crmd
> is now online
> crmd[3142]: 2009/08/12_13:14:25 info: crmd_client_status_callback: Not
> the DC
> crmd[3142]: 2009/08/12_13:14:25 notice: crmd_client_status_callback:
> Status update: Client server3/crmd now has status [online] (DC=false)
> crmd[3142]: 2009/08/12_13:14:25 info: crmd_client_status_callback: Not
> the DC
> crmd[3142]: 2009/08/12_13:14:25 info: mem_handle_event: Got an event
> OC_EV_MS_NEW_MEMBERSHIP from ccm
> crmd[3142]: 2009/08/12_13:14:25 info: mem_handle_event: instance=1,
> nodes=1, new=1, lost=0, n_idx=0, new_idx=0, old_idx=3
> crmd[3142]: 2009/08/12_13:14:25 info: crmd_ccm_msg_callback: Quorum
> (re)attained after event=NEW MEMBERSHIP (id=1)
> crmd[3142]: 2009/08/12_13:14:25 info: ccm_event_detail: NEW MEMBERSHIP:
> trans=1, nodes=1, new=1, lost=0 n_idx=0, new_idx=0, old_idx=3
> crmd[3142]: 2009/08/12_13:14:25 info: ccm_event_detail:         CURRENT:
> server3 [nodeid=1, born=1]
> crmd[3142]: 2009/08/12_13:14:25 info: ccm_event_detail:         NEW:
> server3 [nodeid=1, born=1]
> crmd[3142]: 2009/08/12_13:14:25 info: crm_get_peer: Node server3 now has
> id: 1
> crmd[3142]: 2009/08/12_13:14:25 info: crm_update_peer: Node server3:
> id=1 state=member (new) addr=(null) votes=-1 born=1 seen=1
> proc=00000000000000000000000000000200
> crmd[3142]: 2009/08/12_13:14:25 info: crm_update_peer_proc: server3.ais
> is now online
> crmd[3142]: 2009/08/12_13:14:25 info: do_started: The local CRM is
> operational
> crmd[3142]: 2009/08/12_13:14:25 info: do_state_transition: State
> transition S_STARTING -> S_PENDING [ input=I_PENDING
> cause=C_FSA_INTERNAL origin=do_started ]
> crmd[3142]: 2009/08/12_13:14:36 info: crm_timer_popped: Election Trigger
> (I_DC_TIMEOUT) just popped!
> crmd[3142]: 2009/08/12_13:14:36 WARN: do_log: FSA: Input I_DC_TIMEOUT
> from crm_timer_popped() received in state S_PENDING
> crmd[3142]: 2009/08/12_13:14:36 info: do_state_transition: State
> transition S_PENDING -> S_ELECTION [ input=I_DC_TIMEOUT
> cause=C_TIMER_POPPED origin=crm_timer_popped
> crmd[3142]: 2009/08/12_13:14:36 info: do_state_transition: State
> transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
> cause=C_FSA_INTERNAL origin=do_election_check ]
> crmd[3142]: 2009/08/12_13:14:36 info: do_te_control: Registering TE
> UUID: ecf08a7c-6fb8-43e0-a8b6-011ab83ad5f7
> crmd[3142]: 2009/08/12_13:14:40 info: update_dc: Set DC to server3 (3.0.1)
> crmd[3142]: 2009/08/12_13:14:41 info: do_state_transition: State
> transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED
> cause=C_FSA_INTERNAL origin=check_join_state ]
> crmd[3142]: 2009/08/12_13:14:41 info: do_state_transition: All 1 cluster
> nodes responded to the join offer.
> crmd[3142]: 2009/08/12_13:14:41 info: do_dc_join_finalize: join-1:
> Syncing the CIB from server3 to the rest of the cluster
> cib[3137]: 2009/08/12_13:14:41 info: cib_process_request: Operation
> complete: op cib_sync for section 'all' (origin=local/crmd/14,
> version=0.7.0): ok (rc=0)
> cib[3137]: 2009/08/12_13:14:41 info: cib_process_request: Operation
> complete: op cib_modify for section nodes (origin=local/crmd/15,
> version=0.7.0): ok (rc=0)
> crmd[3142]: 2009/08/12_13:14:41 info: update_attrd: Connecting to attrd...
> crmd[3142]: 2009/08/12_13:14:41 info: attrd_update: Sent update:
> terminate=(null) for server3
> crmd[3142]: 2009/08/12_13:14:41 info: attrd_update: Sent update:
> shutdown=(null) for server3
> attrd[3141]: 2009/08/12_13:14:41 info: find_hash_entry: Creating hash
> entry for terminate
> attrd[3141]: 2009/08/12_13:14:41 info: find_hash_entry: Creating hash
> entry for shutdown
> crmd[3142]: 2009/08/12_13:14:41 info: do_dc_join_ack: join-1: Updating
> node state to member for server3
> cib[3137]: 2009/08/12_13:14:41 info: cib_process_request: Operation
> complete: op cib_delete for section
> //node_sta...@uname='server3']/transient_attributes
> (origin=local/crmd/16, version=0.7.0): ok (rc=0)
> crmd[3142]: 2009/08/12_13:14:41 info: erase_xpath_callback: Deletion of
> "//node_sta...@uname='server3']/transient_attributes": ok (rc=0)
> cib[3137]: 2009/08/12_13:14:41 info: cib_process_request: Operation
> complete: op cib_delete for section //node_sta...@uname='server3']/lrm
> (origin=local/crmd/17, version=0.7.0): ok (rc=0)
> crmd[3142]: 2009/08/12_13:14:41 info: erase_xpath_callback: Deletion of
> "//node_sta...@uname='server3']/lrm": ok (rc=0)
> cib[3137]: 2009/08/12_13:14:41 info: cib_process_request: Operation
> complete: op cib_delete for section //node_sta...@uname='server3']/lrm
> (origin=local/crmd/18, version=0.7.0): ok (rc=0)
> crmd[3142]: 2009/08/12_13:14:41 info: erase_xpath_callback: Deletion of
> "//node_sta...@uname='server3']/lrm": ok (rc=0)
> crmd[3142]: 2009/08/12_13:14:41 info: do_state_transition: State
> transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED
> cause=C_FSA_INTERNAL origin=check_join_state ]
> crmd[3142]: 2009/08/12_13:14:41 info: populate_cib_nodes_ha: Requesting
> the list of configured nodes
> crmd[3142]: 2009/08/12_13:14:43 WARN: get_uuid: Could not calculate UUID
> for server2
> crmd[3142]: 2009/08/12_13:14:43 WARN: populate_cib_nodes_ha: Node
> server2: no uuid found
> crmd[3142]: 2009/08/12_13:14:43 info: do_state_transition: All 1 cluster
> nodes are eligible to run resources.
> crmd[3142]: 2009/08/12_13:14:43 info: do_dc_join_final: Ensuring DC,
> quorum and node attributes are up-to-date
> crmd[3142]: 2009/08/12_13:14:43 info: attrd_update: Sent update:
> (null)=(null) for localhost
> attrd[3141]: 2009/08/12_13:14:43 info: attrd_local_callback: Sending
> full refresh (origin=crmd)
> crmd[3142]: 2009/08/12_13:14:43 info: crm_update_quorum: Updating quorum
> status to true (call=22)
> attrd[3141]: 2009/08/12_13:14:43 info: attrd_trigger_update: Sending
> flush op to all hosts for: shutdown (<null>)
> crmd[3142]: 2009/08/12_13:14:43 info: abort_transition_graph:
> do_te_invoke:190 - Triggered transition abort (complete=1) : Peer Cancelled
> crmd[3142]: 2009/08/12_13:14:43 info: do_pe_invoke: Query 23: Requesting
> the current CIB: S_POLICY_ENGINE
> attrd[3141]: 2009/08/12_13:14:43 info: attrd_trigger_update: Sending
> flush op to all hosts for: terminate (<null>)
> cib[3137]: 2009/08/12_13:14:43 info: cib_process_request: Operation
> complete: op cib_modify for section nodes (origin=local/crmd/20,
> version=0.7.1): ok (rc=0)
> crmd[3142]: 2009/08/12_13:14:43 info: abort_transition_graph:
> need_abort:59 - Triggered transition abort (complete=1) : Non-status change
> crmd[3142]: 2009/08/12_13:14:43 info: need_abort: Aborting on change to
> admin_epoch
> crmd[3142]: 2009/08/12_13:14:43 info: do_pe_invoke: Query 24: Requesting
> the current CIB: S_POLICY_ENGINE
> cib[3137]: 2009/08/12_13:14:43 info: log_data_element: cib:diff: - <cib
> admin_epoch="0" epoch="7" num_updates="1" />
> cib[3137]: 2009/08/12_13:14:43 info: log_data_element: cib:diff: + <cib
> dc-uuid="7c6eb9f6-c938-4012-8ba0-4d5c21dd1315" admin_epoch="0" epoch="8"
> num_updates="1" />
> cib[3137]: 2009/08/12_13:14:43 info: cib_process_request: Operation
> complete: op cib_modify for section cib (origin=local/crmd/22,
> version=0.8.1): ok (rc=0)
> crmd[3142]: 2009/08/12_13:14:43 info: do_pe_invoke_callback: Invoking
> the PE: ref=pe_calc-dc-1250075683-7, seq=1, quorate=1
> pengine[3145]: 2009/08/12_13:14:43 info: unpack_config: Node scores:
> 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
> pengine[3145]: 2009/08/12_13:14:43 ERROR: unpack_resources: No STONITH
> resources have been defined
> pengine[3145]: 2009/08/12_13:14:43 ERROR: unpack_resources: Either
> configure some or disable STONITH with the stonith-enabled option
> pengine[3145]: 2009/08/12_13:14:43 ERROR: unpack_resources: NOTE:
> Clusters with shared data need STONITH to ensure data integrity
> pengine[3145]: 2009/08/12_13:14:43 info: determine_online_status: Node
> server3 is online
> pengine[3145]: 2009/08/12_13:14:43 info: stage6: Delaying fencing
> operations until there are resources to manage
> cib[3146]: 2009/08/12_13:14:43 info: write_cib_contents: Archived
> previous version as /var/lib/heartbeat/crm/cib-10.raw
> cib[3146]: 2009/08/12_13:14:43 info: write_cib_contents: Wrote version
> 0.8.0 of the CIB to disk (digest: a037845ebd38e720357a1dbf3e1faf68)
>
> crmd[3142]: 2009/08/12_13:14:36 WARN: cib_client_add_notify_callback:
> Callback already present
> crmd[3142]: 2009/08/12_13:14:36 info: set_graph_functions: Setting
> custom graph functions
> crmd[3142]: 2009/08/12_13:14:36 info: unpack_graph: Unpacked transition
> -1: 0 actions in 0 synapses
> crmd[3142]: 2009/08/12_13:14:36 info: start_subsystem: Starting
> sub-system "pengine"
> pengine[3145]: 2009/08/12_13:14:36 info: main: Starting pengine
> crmd[3142]: 2009/08/12_13:14:39 info: do_dc_takeover: Taking over DC
> status for this partition
> cib[3137]: 2009/08/12_13:14:39 info: cib_process_readwrite: We are now
> in R/W mode
> cib[3137]: 2009/08/12_13:14:39 info: cib_process_request: Operation
> complete: op cib_master for section 'all' (origin=local/crmd/6,
> version=0.7.0): ok (rc=0)
> cib[3137]: 2009/08/12_13:14:39 info: cib_process_request: Operation
> complete: op cib_modify for section cib (origin=local/crmd/7,
> version=0.7.0): ok (rc=0)
> cib[3137]: 2009/08/12_13:14:39 info: cib_process_request: Operation
> complete: op cib_modify for section crm_config (origin=local/crmd/9,
> version=0.7.0): ok (rc=0)
> crmd[3142]: 2009/08/12_13:14:39 info: join_make_offer: Making join
> offers based on membership 1
> crmd[3142]: 2009/08/12_13:14:39 info: do_dc_join_offer_all: join-1:
> Waiting on 1 outstanding join acks
> crmd[3142]: 2009/08/12_13:14:39 info: te_connect_stonith: Attempting
> connection to fencing daemon...
> cib[3137]: 2009/08/12_13:14:39 info: cib_process_request: Operation
> complete: op cib_modify for section crm_config (origin=local/crmd/11,
> version=0.7.0): ok (rc=0)
> crmd[3142]: 2009/08/12_13:14:40 info: te_connect_stonith: Connected
> crmd[3142]: 2009/08/12_13:14:40 info: config_query_callback: Checking
> for expired actions every 900000ms
>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to