RE: [Linux-HA] HB sometimes forgets to migrate my resources...

Graham, Simon Tue, 04 Dec 2007 08:14:25 -0800

Oops - forgot to mention HB version - 2.1.0; really just looking for
pointers to help me debug here...


Simon

> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:linux-ha-
> [EMAIL PROTECTED] On Behalf Of Graham, Simon
> Sent: Monday, December 03, 2007 2:33 PM
> To: General Linux-HA mailing list
> Subject: [Linux-HA] HB sometimes forgets to migrate my resources...
> 
> I've been seeing an occasional problem where resources are not
> restarted
> when the node running them is powered off; in the specific case in
> question, I have a 2-node cluster (with STONITH), the resources are
> running on one node (node1) and the DC is on the other node (node0) --
> after power cycling node1, I see the attached in the ha log file and
> then nothing else until about 3 minutes later when the other node
comes
> back.
> 
> Any pointers to what to look at would be appreciated... it seems to me
> that it is related the order in which the various events are processed
> but I cant quite follow the code...
> 
> Thanks,
> Simon
> 
> Nov 30 13:40:03 node0 heartbeat: [19665]: WARN: node node1: is dead
> Nov 30 13:40:03 node0 heartbeat: [19665]: info: Link node1:priv0 dead.
> Nov 30 13:40:03 node0 heartbeat: [19665]: info: Link node1:biz0 dead.
> Nov 30 13:40:03 node0 crmd: [24367]: notice: crmd_ha_status_callback:
> Status update: Node node1 now has status [dead]
> Nov 30 13:40:03 node0 cib: [24362]: info: cib_diff_notify: Local-only
> Change (client:24367, call: 66): 0.2.51 (ok)
> Nov 30 13:40:03 node0 ccm: [24360]: debug: quorum plugin: majority
> Nov 30 13:40:03 node0 ccm: [24360]: debug: cluster:linux-ha,
> member_count=1, member_quorum_votes=100
> Nov 30 13:40:03 node0 ccm: [24360]: debug: total_node_count=2,
> total_quorum_votes=200
> Nov 30 13:40:03 node0 ccm: [24360]: debug: quorum plugin: twonodes
> Nov 30 13:40:03 node0 ccm: [24360]: debug: cluster:linux-ha,
> member_count=1, member_quorum_votes=100
> Nov 30 13:40:03 node0 ccm: [24360]: debug: total_node_count=2,
> total_quorum_votes=200
> Nov 30 13:40:03 node0 ccm: [24360]: info: Break tie for 2 nodes
cluster
> Nov 30 13:40:03 node0 tengine: [25399]: info: te_update_diff:
> Processing
> diff (cib_update): 0.2.51 -> 0.2.51
> Nov 30 13:40:03 node0 tengine: [25399]: WARN: match_down_event: No
> match
> for shutdown action on 44454c4c-3700-104e-8035-b5c04f504431
> Nov 30 13:40:03 node0 crmd: [24367]: info: mem_handle_event: Got an
> event OC_EV_MS_INVALID from ccm
> Nov 30 13:40:03 node0 tengine: [25399]: info: extract_event:
> Stonith/shutdown of 44454c4c-3700-104e-8035-b5c04f504431 not matched
> Nov 30 13:40:03 node0 crmd: [24367]: info: mem_handle_event: no
> mbr_track info
> Nov 30 13:40:03 node0 tengine: [25399]: info: update_abort_priority:
> Abort priority upgraded to 1000000
> Nov 30 13:40:03 node0 crmd: [24367]: info: do_state_transition: node0:
> State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
> cause=C_IPC_MESSAGE origin=route_message ]
> Nov 30 13:40:03 node0 tengine: [25399]: info: te_update_diff: Aborting
> on transient_attributes deletions
> Nov 30 13:40:03 node0 crmd: [24367]: info: do_state_transition: All 2
> cluster nodes are eligible to run resources.
> Nov 30 13:40:03 node0 cib: [24362]: info: mem_handle_event: Got an
> event
> OC_EV_MS_INVALID from ccm
> Nov 30 13:40:03 node0 cib: [24362]: info: mem_handle_event: no
> mbr_track
> info
> Nov 30 13:40:03 node0 cib: [24362]: info: mem_handle_event: Got an
> event
> OC_EV_MS_NEW_MEMBERSHIP from ccm
> Nov 30 13:40:03 node0 cib: [24362]: info: mem_handle_event:
instance=4,
> nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3
> Nov 30 13:40:03 node0 cib: [24362]: info: cib_ccm_msg_callback: LOST:
> node1
> Nov 30 13:40:03 node0 cib: [24362]: info: cib_ccm_msg_callback: PEER:
> node0
> Nov 30 13:40:03 node0 crmd: [24367]: info: do_pe_invoke_callback:
> Waiting for another CCM event before proceeding: CIB=4 > CRM=3
> Nov 30 13:40:03 node0 crmd: [24367]: info: mem_handle_event: Got an
> event OC_EV_MS_NEW_MEMBERSHIP from ccm
> Nov 30 13:40:03 node0 crmd: [24367]: info: mem_handle_event:
> instance=4,
> nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3
> Nov 30 13:40:03 node0 crmd: [24367]: info: crmd_ccm_msg_callback:
> Quorum
> (re)attained after event=NEW MEMBERSHIP (id=4)
> Nov 30 13:40:03 node0 crmd: [24367]: info: ccm_event_detail: NEW
> MEMBERSHIP: trans=4, nodes=1, new=0, lost=1 n_idx=0, new_idx=1,
> old_idx=3
> Nov 30 13:40:03 node0 crmd: [24367]: info: ccm_event_detail:
>       CURRENT:
> node0 [nodeid=0, born=4]
> Nov 30 13:40:03 node0 crmd: [24367]: info: ccm_event_detail:  LOST:
> node1 [nodeid=1, born=3]
> Nov 30 13:40:04 node0 cib: [24362]: info: cib_diff_notify: Local-only
> Change (client:24367, call: 69): 0.2.51 (ok)
> Nov 30 13:40:04 node0 tengine: [25399]: info: te_update_diff:
> Processing
> diff (cib_update): 0.2.51 -> 0.2.51
> Nov 30 13:40:04 node0 cib: [907]: info: write_cib_contents: Wrote
> version 0.2.51 of the CIB to disk (digest:
> a0c6f0b8d31bfac96182e1bc2d02cad0)
> Nov 30 13:40:04 node0 cib: [908]: info: write_cib_contents: Wrote
> version 0.2.51 of the CIB to disk (digest:
> 66b77c6223da9fcedcadcbac04fd0d67)
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

RE: [Linux-HA] HB sometimes forgets to migrate my resources...

Reply via email to