On Wed, Jan 21, 2009 at 17:47, Darren Mansell
<[email protected]> wrote:
> Hi all.
>
> I've got a 2-node cluster set up using Heartbeat, CRM and Ldirectord.
>
> Most of the time it works great but randomly, resources won't fail over
> to the other node.

It looks related to the use of quorumd - I didn't think we shipped
that with any version of SLES.
In any case, I've no idea how it works :-)

> When this happens I can't shut Heartbeat down, I have
> to kill -9 the heartbeat master control process then I can restart it. I
> have to do this a lot just to restart heartbeat anyway.
>
> It seems to be if I leave the server for a few hours that the problem
> occurs. If I test it over and over then it's fine.
>
> When the problem occurs this is what it logs on the node still up:
>
> Jan 21 15:58:52 ogg-dvla-02 crmd: [3789]: notice: 
> crmd_client_status_callback: Status update: Client ogg-dvla-01/crmd now has 
> status [offline]
> Jan 21 15:58:52 ogg-dvla-02 crmd: [3789]: info: mem_handle_event: Got an 
> event OC_EV_MS_NOT_PRIMARY from ccm
> Jan 21 15:58:52 ogg-dvla-02 crmd: [3789]: info: mem_handle_event: 
> instance=61, nodes=2, new=1, lost=0, n_idx=0, new_idx=2, old_idx=4
> Jan 21 15:58:52 ogg-dvla-02 crmd: [3789]: info: crmd_ccm_msg_callback: Quorum 
> lost after event=NOT PRIMARY (id=61)
> Jan 21 15:58:52 ogg-dvla-02 cib: [3785]: info: mem_handle_event: Got an event 
> OC_EV_MS_NOT_PRIMARY from ccm
> Jan 21 15:58:52 ogg-dvla-02 cib: [3785]: info: mem_handle_event: instance=61, 
> nodes=2, new=1, lost=0, n_idx=0, new_idx=2, old_idx=4
> Jan 21 15:58:52 ogg-dvla-02 cib: [3785]: info: apply_xml_diff: Digest 
> mis-match: expected 3e6e45302914a4cebbbd69f9dacc7426, calculated 
> 88d796cb9934a600c01584cc89659117
> Jan 21 15:58:52 ogg-dvla-02 cib: [3785]: info: cib_process_diff: Diff 0.61.28 
> -> 0.61.29 not applied to 0.61.28: Failed application of a global update.  
> Requesting full refresh.
> Jan 21 15:58:52 ogg-dvla-02 cib: [3785]: info: cib_process_diff: Requesting 
> re-sync from peer: Failed application of a global update.  Requesting full 
> refresh.
> Jan 21 15:58:52 ogg-dvla-02 cib: [3785]: WARN: do_cib_notify: cib_apply_diff 
> of <diff > FAILED: Application of an update diff failed, requesting a full 
> refresh
> Jan 21 15:58:52 ogg-dvla-02 cib: [3785]: WARN: cib_process_request: 
> cib_apply_diff operation failed: Application of an update diff failed, 
> requesting a full refresh
> Jan 21 15:58:52 ogg-dvla-02 cib: [3785]: WARN: cib_process_diff: Not applying 
> diff 0.61.29 -> 0.61.30 (sync in progress)
> Jan 21 15:58:52 ogg-dvla-02 cib: [3785]: WARN: do_cib_notify: cib_apply_diff 
> of <diff > FAILED: Application of an update diff failed, requesting a full 
> refresh
> Jan 21 15:58:52 ogg-dvla-02 cib: [3785]: WARN: cib_process_request: 
> cib_apply_diff operation failed: Application of an update diff failed, 
> requesting a full refresh
> Jan 21 15:58:52 ogg-dvla-02 cib: [3785]: WARN: cib_process_diff: Not applying 
> diff 0.61.30 -> 0.61.31 (sync in progress)
> Jan 21 15:58:52 ogg-dvla-02 cib: [3785]: WARN: do_cib_notify: cib_apply_diff 
> of <diff > FAILED: Application of an update diff failed, requesting a full 
> refresh
> Jan 21 15:58:52 ogg-dvla-02 cib: [3785]: WARN: cib_process_request: 
> cib_apply_diff operation failed: Application of an update diff failed, 
> requesting a full refresh
> Jan 21 15:58:52 ogg-dvla-02 cib: [3785]: WARN: cib_process_diff: Not applying 
> diff 0.61.31 -> 0.61.32 (sync in progress)
> Jan 21 15:58:52 ogg-dvla-02 cib: [3785]: WARN: do_cib_notify: cib_apply_diff 
> of <diff > FAILED: Application of an update diff failed, requesting a full 
> refresh
> Jan 21 15:58:52 ogg-dvla-02 cib: [3785]: WARN: cib_process_request: 
> cib_apply_diff operation failed: Application of an update diff failed, 
> requesting a full refresh
> Jan 21 15:58:52 ogg-dvla-02 cib: [3785]: WARN: cib_process_diff: Not applying 
> diff 0.61.32 -> 0.61.33 (sync in progress)
> Jan 21 15:58:52 ogg-dvla-02 cib: [3785]: WARN: do_cib_notify: cib_apply_diff 
> of <diff > FAILED: Application of an update diff failed, requesting a full 
> refresh
> Jan 21 15:58:52 ogg-dvla-02 cib: [3785]: WARN: cib_process_request: 
> cib_apply_diff operation failed: Application of an update diff failed, 
> requesting a full refresh
> Jan 21 15:58:52 ogg-dvla-02 cib: [3785]: info: cib_client_status_callback: 
> Status update: Client ogg-dvla-01/cib now has status [leave]
> Jan 21 15:58:57 ogg-dvla-02 ccm: [3784]: debug: quorum plugin: majority
> Jan 21 15:58:57 ogg-dvla-02 ccm: [3784]: debug: cluster:linux-ha, 
> member_count=1, member_quorum_votes=100
> Jan 21 15:58:57 ogg-dvla-02 crmd: [3789]: info: mem_handle_event: Got an 
> event OC_EV_MS_INVALID from ccm
> Jan 21 15:58:57 ogg-dvla-02 cib: [3785]: info: mem_handle_event: Got an event 
> OC_EV_MS_INVALID from ccm
> Jan 21 15:58:57 ogg-dvla-02 ccm: [3784]: debug: total_node_count=2, 
> total_quorum_votes=200
> Jan 21 15:58:57 ogg-dvla-02 crmd: [3789]: info: mem_handle_event: no 
> mbr_track info
> Jan 21 15:58:57 ogg-dvla-02 cib: [3785]: info: mem_handle_event: no mbr_track 
> info
> Jan 21 15:58:57 ogg-dvla-02 ccm: [3784]: debug: quorum plugin: twonodes
> Jan 21 15:58:57 ogg-dvla-02 cib: [3785]: info: mem_handle_event: Got an event 
> OC_EV_MS_NEW_MEMBERSHIP from ccm
> Jan 21 15:58:57 ogg-dvla-02 ccm: [3784]: debug: cluster:linux-ha, 
> member_count=1, member_quorum_votes=100
> Jan 21 15:58:57 ogg-dvla-02 cib: [3785]: info: mem_handle_event: instance=62, 
> nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3
> Jan 21 15:58:57 ogg-dvla-02 ccm: [3784]: debug: total_node_count=2, 
> total_quorum_votes=200
> Jan 21 15:58:57 ogg-dvla-02 crmd: [3789]: info: mem_handle_event: Got an 
> event OC_EV_MS_NEW_MEMBERSHIP from ccm
> Jan 21 15:58:57 ogg-dvla-02 cib: [3785]: info: cib_ccm_msg_callback: LOST: 
> ogg-dvla-01
> Jan 21 15:58:57 ogg-dvla-02 ccm: [3784]: info: Break tie for 2 nodes cluster
> Jan 21 15:58:57 ogg-dvla-02 crmd: [3789]: info: mem_handle_event: 
> instance=62, nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3
> Jan 21 15:58:57 ogg-dvla-02 cib: [3785]: info: cib_ccm_msg_callback: PEER: 
> ogg-dvla-02
> Jan 21 15:58:57 ogg-dvla-02 crmd: [3789]: info: crmd_ccm_msg_callback: Quorum 
> (re)attained after event=NEW MEMBERSHIP (id=62)
> Jan 21 15:58:57 ogg-dvla-02 crmd: [3789]: info: ccm_event_detail: NEW 
> MEMBERSHIP: trans=62, nodes=1, new=0, lost=1 n_idx=0, new_idx=1, old_idx=3
> Jan 21 15:58:57 ogg-dvla-02 crmd: [3789]: info: ccm_event_detail:       
> CURRENT: ogg-dvla-02 [nodeid=1, born=62]
> Jan 21 15:58:57 ogg-dvla-02 crmd: [3789]: info: ccm_event_detail:       LOST: 
>    ogg-dvla-01 [nodeid=0, born=1]
> Jan 21 15:59:23 ogg-dvla-02 heartbeat: [3275]: WARN: node ogg-dvla-01: is dead
> Jan 21 15:59:23 ogg-dvla-02 heartbeat: [3275]: info: Link ogg-dvla-01:eth0 
> dead.
> Jan 21 15:59:23 ogg-dvla-02 crmd: [3789]: notice: crmd_ha_status_callback: 
> Status update: Node ogg-dvla-01 now has status [dead]
>
>
> And this is what it logs when it fails over OK:
>
> Jan 21 16:29:31 ogg-dvla-02 cib: [3785]: info: cib_stats: Processed 39 
> operations (1794.00us average, 0% utilization) in the last 10min
> Jan 21 16:32:04 ogg-dvla-02 crmd: [3789]: info: handle_shutdown_request: 
> Creating shutdown request for ogg-dvla-01
> Jan 21 16:32:04 ogg-dvla-02 tengine: [7204]: info: extract_event: Aborting on 
> shutdown attribute for 74195e76-f72c-45a2-aba5-07a0574c4058
> Jan 21 16:32:04 ogg-dvla-02 crmd: [3789]: info: do_state_transition: State 
> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_IPC_MESSAGE 
> origin=route_message ]
> Jan 21 16:32:04 ogg-dvla-02 tengine: [7204]: info: update_abort_priority: 
> Abort priority upgraded to 1000000
> Jan 21 16:32:04 ogg-dvla-02 crmd: [3789]: info: do_state_transition: All 2 
> cluster nodes are eligible to run resources.
> Jan 21 16:32:04 ogg-dvla-02 pengine: [7205]: info: determine_online_status: 
> Node ogg-dvla-01 is shutting down
> Jan 21 16:32:04 ogg-dvla-02 pengine: [7205]: info: determine_online_status: 
> Node ogg-dvla-02 is online
> Jan 21 16:32:04 ogg-dvla-02 pengine: [7205]: notice: group_print: Resource 
> Group: load_balancer
> Jan 21 16:32:04 ogg-dvla-02 pengine: [7205]: notice: native_print:     vip    
>   (ocf::heartbeat:IPaddr2):       Started ogg-dvla-01
> Jan 21 16:32:04 ogg-dvla-02 pengine: [7205]: notice: native_print:     
> ldirector        (ocf::heartbeat:ldirectord):    Started ogg-dvla-01
> Jan 21 16:32:04 ogg-dvla-02 pengine: [7205]: notice: NoRoleChange: Move  
> resource vip   (ogg-dvla-01 -> ogg-dvla-02)
> Jan 21 16:32:04 ogg-dvla-02 pengine: [7205]: notice: StopRsc:   ogg-dvla-01   
>   Stop vip
> Jan 21 16:32:04 ogg-dvla-02 pengine: [7205]: notice: StartRsc:  ogg-dvla-02   
>   Start vip
> Jan 21 16:32:04 ogg-dvla-02 pengine: [7205]: notice: RecurringOp: ogg-dvla-02 
>      vip_monitor_20000
> Jan 21 16:32:04 ogg-dvla-02 pengine: [7205]: notice: NoRoleChange: Move  
> resource ldirector     (ogg-dvla-01 -> ogg-dvla-02)
> Jan 21 16:32:04 ogg-dvla-02 pengine: [7205]: notice: StopRsc:   ogg-dvla-01   
>   Stop ldirector
> Jan 21 16:32:04 ogg-dvla-02 pengine: [7205]: notice: StartRsc:  ogg-dvla-02   
>   Start ldirector
> Jan 21 16:32:04 ogg-dvla-02 pengine: [7205]: notice: RecurringOp: ogg-dvla-02 
>      ldirector_monitor_20000
> Jan 21 16:32:04 ogg-dvla-02 pengine: [7205]: info: stage6: Scheduling Node 
> ogg-dvla-01 for shutdown
> Jan 21 16:32:04 ogg-dvla-02 crmd: [3789]: info: do_state_transition: State 
> transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
> cause=C_IPC_MESSAGE origin=route_message ]
> Jan 21 16:32:04 ogg-dvla-02 tengine: [7204]: info: unpack_graph: Unpacked 
> transition 11: 12 actions in 12 synapses
> Jan 21 16:32:04 ogg-dvla-02 tengine: [7204]: info: te_pseudo_action: Pseudo 
> action 15 fired and confirmed
> Jan 21 16:32:04 ogg-dvla-02 tengine: [7204]: info: send_rsc_command: 
> Initiating action 10: ldirector_stop_0 on ogg-dvla-01
> Jan 21 16:32:04 ogg-dvla-02 pengine: [7205]: info: process_pe_message: 
> Transition 11: PEngine Input stored in: 
> /var/lib/heartbeat/pengine/pe-input-1809.bz2
> Jan 21 16:32:05 ogg-dvla-02 tengine: [7204]: info: match_graph_event: Action 
> ldirector_stop_0 (10) confirmed on ogg-dvla-01 (rc=0)
> Jan 21 16:32:05 ogg-dvla-02 tengine: [7204]: info: send_rsc_command: 
> Initiating action 7: vip_stop_0 on ogg-dvla-01
> Jan 21 16:32:06 ogg-dvla-02 tengine: [7204]: info: match_graph_event: Action 
> vip_stop_0 (7) confirmed on ogg-dvla-01 (rc=0)
> Jan 21 16:32:06 ogg-dvla-02 tengine: [7204]: info: te_pseudo_action: Pseudo 
> action 16 fired and confirmed
> Jan 21 16:32:06 ogg-dvla-02 tengine: [7204]: info: te_pseudo_action: Pseudo 
> action 3 fired and confirmed
> Jan 21 16:32:06 ogg-dvla-02 tengine: [7204]: info: te_crm_command: Executing 
> crm-event (18): do_shutdown on ogg-dvla-01
> Jan 21 16:32:06 ogg-dvla-02 crmd: [3789]: info: do_lrm_rsc_op: Performing 
> op=vip_start_0 key=8:11:4f1659a1-c670-4bfc-972a-341fd26c1aba)
> Jan 21 16:32:06 ogg-dvla-02 tengine: [7204]: info: te_pseudo_action: Pseudo 
> action 13 fired and confirmed
> Jan 21 16:32:06 ogg-dvla-02 lrmd: [3786]: info: rsc:vip: start
> Jan 21 16:32:06 ogg-dvla-02 tengine: [7204]: info: send_rsc_command: 
> Initiating action 8: vip_start_0 on ogg-dvla-02
> Jan 21 16:32:06 ogg-dvla-02 IPaddr2[18852]: [18887]: INFO: Removing 
> conflicting loopback lo.
> Jan 21 16:32:06 ogg-dvla-02 IPaddr2[18852]: [18888]: INFO: ip -f inet addr 
> delete 10.167.30.76/32 dev lo
> Jan 21 16:32:06 ogg-dvla-02 IPaddr2[18852]: [18890]: INFO: ip -o -f inet addr 
> show lo
> Jan 21 16:32:06 ogg-dvla-02 IPaddr2[18852]: [18892]: INFO: ip route delete 
> 10.167.30.76 dev lo
> Jan 21 16:32:06 ogg-dvla-02 lrmd: [3786]: info: RA output: (vip:start:stderr) 
> RTNETLINK answers: No such process
> Jan 21 16:32:06 ogg-dvla-02 IPaddr2[18852]: [18894]: INFO: ip -f inet addr 
> add 10.167.30.76/25 brd 10.167.30.127 dev eth0
> Jan 21 16:32:06 ogg-dvla-02 IPaddr2[18852]: [18896]: INFO: ip link set eth0 up
> Jan 21 16:32:06 ogg-dvla-02 IPaddr2[18852]: [18898]: INFO: 
> /usr/lib/heartbeat/send_arp -i 200 -r 5 -p 
> /var/run/heartbeat/rsctmp/send_arp/send_arp-10.167.30.76 eth0 10.167.30.76 
> auto not_used not_used
> Jan 21 16:32:06 ogg-dvla-02 crmd: [3789]: info: process_lrm_event: LRM 
> operation vip_start_0 (call=22, rc=0) complete
> Jan 21 16:32:06 ogg-dvla-02 tengine: [7204]: info: match_graph_event: Action 
> vip_start_0 (8) confirmed on ogg-dvla-02 (rc=0)
> Jan 21 16:32:06 ogg-dvla-02 crmd: [3789]: info: do_lrm_rsc_op: Performing 
> op=vip_monitor_20000 key=9:11:4f1659a1-c670-4bfc-972a-341fd26c1aba)
> Jan 21 16:32:06 ogg-dvla-02 tengine: [7204]: info: send_rsc_command: 
> Initiating action 9: vip_monitor_20000 on ogg-dvla-02
> Jan 21 16:32:06 ogg-dvla-02 tengine: [7204]: info: send_rsc_command: 
> Initiating action 11: ldirector_start_0 on ogg-dvla-02
> Jan 21 16:32:06 ogg-dvla-02 crmd: [3789]: info: do_lrm_rsc_op: Performing 
> op=ldirector_start_0 key=11:11:4f1659a1-c670-4bfc-972a-341fd26c1aba)
> Jan 21 16:32:06 ogg-dvla-02 lrmd: [3786]: info: rsc:ldirector: start
> Jan 21 16:32:06 ogg-dvla-02 crmd: [3789]: info: process_lrm_event: LRM 
> operation vip_monitor_20000 (call=23, rc=0) complete
> Jan 21 16:32:06 ogg-dvla-02 tengine: [7204]: info: match_graph_event: Action 
> vip_monitor_20000 (9) confirmed on ogg-dvla-02 (rc=0)
> Jan 21 16:32:07 ogg-dvla-02 crmd: [3789]: info: process_lrm_event: LRM 
> operation ldirector_start_0 (call=24, rc=0) complete
> Jan 21 16:32:07 ogg-dvla-02 tengine: [7204]: info: match_graph_event: Action 
> ldirector_start_0 (11) confirmed on ogg-dvla-02 (rc=0)
> Jan 21 16:32:07 ogg-dvla-02 tengine: [7204]: info: te_pseudo_action: Pseudo 
> action 14 fired and confirmed
> Jan 21 16:32:07 ogg-dvla-02 tengine: [7204]: info: send_rsc_command: 
> Initiating action 12: ldirector_monitor_20000 on ogg-dvla-02
> Jan 21 16:32:07 ogg-dvla-02 crmd: [3789]: info: do_lrm_rsc_op: Performing 
> op=ldirector_monitor_20000 key=12:11:4f1659a1-c670-4bfc-972a-341fd26c1aba)
> Jan 21 16:32:07 ogg-dvla-02 crmd: [3789]: notice: 
> crmd_client_status_callback: Status update: Client ogg-dvla-01/crmd now has 
> status [offline]
> Jan 21 16:32:07 ogg-dvla-02 crmd: [3789]: info: process_lrm_event: LRM 
> operation ldirector_monitor_20000 (call=25, rc=0) complete
> Jan 21 16:32:07 ogg-dvla-02 tengine: [7204]: info: match_graph_event: Action 
> ldirector_monitor_20000 (12) confirmed on ogg-dvla-02 (rc=0)
> Jan 21 16:32:07 ogg-dvla-02 cib: [3785]: info: cib_process_shutdown_req: 
> Shutdown REQ from ogg-dvla-01
> Jan 21 16:32:08 ogg-dvla-02 cib: [3785]: info: sync_our_cib: Syncing CIB to 
> ogg-dvla-01
> Jan 21 16:32:08 ogg-dvla-02 ccm: [3784]: debug: quorum plugin: majority
> Jan 21 16:32:08 ogg-dvla-02 crmd: [3789]: info: mem_handle_event: Got an 
> event OC_EV_MS_INVALID from ccm
> Jan 21 16:32:08 ogg-dvla-02 ccm: [3784]: debug: cluster:linux-ha, 
> member_count=1, member_quorum_votes=100
> Jan 21 16:32:08 ogg-dvla-02 crmd: [3789]: info: mem_handle_event: no 
> mbr_track info
> Jan 21 16:32:08 ogg-dvla-02 ccm: [3784]: debug: total_node_count=2, 
> total_quorum_votes=200
> Jan 21 16:32:08 ogg-dvla-02 crmd: [3789]: info: mem_handle_event: Got an 
> event OC_EV_MS_NEW_MEMBERSHIP from ccm
> Jan 21 16:32:08 ogg-dvla-02 ccm: [3784]: debug: quorum plugin: twonodes
> Jan 21 16:32:08 ogg-dvla-02 crmd: [3789]: info: mem_handle_event: 
> instance=70, nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3
> Jan 21 16:32:08 ogg-dvla-02 ccm: [3784]: debug: cluster:linux-ha, 
> member_count=1, member_quorum_votes=100
> Jan 21 16:32:08 ogg-dvla-02 crmd: [3789]: info: crmd_ccm_msg_callback: Quorum 
> (re)attained after event=NEW MEMBERSHIP (id=70)
> Jan 21 16:32:08 ogg-dvla-02 ccm: [3784]: debug: total_node_count=2, 
> total_quorum_votes=200
> Jan 21 16:32:08 ogg-dvla-02 crmd: [3789]: info: ccm_event_detail: NEW 
> MEMBERSHIP: trans=70, nodes=1, new=0, lost=1 n_idx=0, new_idx=1, old_idx=3
> Jan 21 16:32:08 ogg-dvla-02 ccm: [3784]: info: Break tie for 2 nodes cluster
> Jan 21 16:32:08 ogg-dvla-02 crmd: [3789]: info: ccm_event_detail:       
> CURRENT: ogg-dvla-02 [nodeid=1, born=70]
> Jan 21 16:32:08 ogg-dvla-02 crmd: [3789]: info: ccm_event_detail:       LOST: 
>    ogg-dvla-01 [nodeid=0, born=69]
> Jan 21 16:32:08 ogg-dvla-02 cib: [3785]: info: cib_client_status_callback: 
> Status update: Client ogg-dvla-01/cib now has status [leave]
> Jan 21 16:32:08 ogg-dvla-02 cib: [3785]: info: mem_handle_event: Got an event 
> OC_EV_MS_INVALID from ccm
> Jan 21 16:32:08 ogg-dvla-02 cib: [3785]: info: mem_handle_event: no mbr_track 
> info
> Jan 21 16:32:08 ogg-dvla-02 cib: [3785]: info: mem_handle_event: Got an event 
> OC_EV_MS_NEW_MEMBERSHIP from ccm
> Jan 21 16:32:08 ogg-dvla-02 cib: [3785]: info: mem_handle_event: instance=70, 
> nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3
> Jan 21 16:32:08 ogg-dvla-02 cib: [3785]: info: cib_ccm_msg_callback: LOST: 
> ogg-dvla-01
> Jan 21 16:32:08 ogg-dvla-02 cib: [3785]: info: cib_ccm_msg_callback: PEER: 
> ogg-dvla-02
> Jan 21 16:32:08 ogg-dvla-02 tengine: [7204]: info: run_graph: Transition 11: 
> (Complete=12, Pending=0, Fired=0, Skipped=0, Incomplete=0)
> Jan 21 16:32:08 ogg-dvla-02 tengine: [7204]: info: notify_crmd: Transition 11 
> status: te_complete - <null>
> Jan 21 16:32:08 ogg-dvla-02 crmd: [3789]: info: do_state_transition: State 
> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
> cause=C_IPC_MESSAGE origin=route_message ]
> Jan 21 16:32:23 ogg-dvla-02 sshd[19003]: Accepted publickey for root from 
> 192.168.176.253 port 51320 ssh2
> Jan 21 16:32:39 ogg-dvla-02 heartbeat: [3275]: WARN: node ogg-dvla-01: is dead
> Jan 21 16:32:39 ogg-dvla-02 crmd: [3789]: notice: crmd_ha_status_callback: 
> Status update: Node ogg-dvla-01 now has status [dead]
> Jan 21 16:32:39 ogg-dvla-02 heartbeat: [3275]: info: Link ogg-dvla-01:eth0 
> dead.
>
>
> Perhaps it is because I'm rebooting through init rather than pulling the
> ethernet cable or power cable out but I don't have physical access to
> the box.
>
> Does anyone have any ideas with what's happening? Versions and config
> are below. Thanks.
>
> Suse Enterprise 10 SP2
>
> heartbeat-2.1.3-0.9
> heartbeat-pils-2.1.3-0.9
> heartbeat-stonith-2.1.3-0.9
> heartbeat-ldirectord-2.1.3-0.9
> heartbeat-cmpi-2.1.3-0.9
>
> /etc/ha.d/ha.cf:
> crm on
> udpport 694
> ucast eth0 10.167.30.71
> ucast eth0 10.167.30.73
> node ogg-dvla-01
> node ogg-dvla-02
>
> /etc/ha.d/ldirectord.cf:
> # /etc/ha.d/ldirectord.cf
> checktimeout=3
> checkinterval=5
> autoreload=yes
> logfile="/var/log/ldirectord.log"
> quiescent=yes
> virtual=10.167.30.76:80
>        [email protected]
>        fallback=127.0.0.1:80
>        real=10.167.30.71:80 gate
>        real=10.167.30.73:80 gate
>        service=http
>        request="test.html"
>        receive="Still alive"
>        scheduler=wlc
>        protocol=tcp
>        checktype=negotiate
> virtual=10.167.30.76:3306
>        [email protected]
>        fallback=127.0.0.1:3306
>        real=10.167.30.71:3306 gate
>        real=10.167.30.73:3306 gate
>        service=mysql
>        login="root"
>        passwd="password"
>        database="ldirector"
>        request="SELECT * from connectioncheck;"
>        scheduler=wlc
>        protocol=tcp
>        checktype=negotiate
>
>
> CIB resources:
>
>  <resources>
>   <group id="load_balancer">
>     <meta_attributes id="load_balancer_meta_attrs">
>       <attributes>
>         <nvpair id="load_balancer_metaattr_target_role" name="target_role" 
> value="started"/>
>         <nvpair id="load_balancer_metaattr_ordered" name="ordered" 
> value="true"/>
>         <nvpair id="load_balancer_metaattr_collocated" name="collocated" 
> value="true"/>
>       </attributes>
>     </meta_attributes>
>     <primitive id="vip" class="ocf" type="IPaddr2" provider="heartbeat">
>       <instance_attributes id="vip_instance_attrs">
>         <attributes>
>           <nvpair id="d87cd780-3a51-419d-ac47-0fb150ec155b" name="ip" 
> value="10.167.30.76"/>
>           <nvpair id="a390efaa-2b48-406b-b122-a60c2af7b809" 
> name="lvs_support" value="true"/>
>         </attributes>
>       </instance_attributes>
>       <operations>
>         <op id="c1aa0a35-d440-4e06-806e-a2886d2cea0a" name="monitor" 
> interval="20" timeout="10" start_delay="0" on_fail="restart"/>
>       </operations>
>     </primitive>
>     <primitive id="ldirector" class="ocf" type="ldirectord" 
> provider="heartbeat">
>       <instance_attributes id="ldirector_instance_attrs">
>         <attributes>
>           <nvpair id="c89bec2c-82fa-4405-b672-bbc842ce108c" name="configfile" 
> value="/etc/ha.d/ldirectord.cf"/>
>         </attributes>
>       </instance_attributes>
>       <operations>
>         <op id="d77c87dc-831c-4ff5-8bd5-ea269e7ed3f3" name="monitor" 
> interval="20" timeout="10" start_delay="0" on_fail="restart"/>
>       </operations>
>     </primitive>
>   </group>
>  </resources>
>
>
>
> --
> Darren Mansell <[email protected]>
> OpenGI Ltd
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to