[Linux-HA] Fencing prevents resource from failing over

abhishek.bagchi Sun, 25 Nov 2007 21:26:06 -0800

Hi,
I've a 2 node active/passive cluster ( active node=>active , passive
node=>standby) using heartbeat 2.0.8 . I recently enabled stonith . The
stonith device is an rsh device that tries to restart the cluster node.
However, something that used to work with stonith disabled has stopped
working now ; Node failover on network cable disconnection. I believe
since the stonith device uses the network, the stonith fails and hence
the resource is left wherever it was running.   
 
Can anyone please help resolve this problem (this is probably not a
problem and this is how stonith is expected to work )? I would like to
know if there's anyway to tell the passive (currently active node) to
give up trying to stonith and then start the resource. I've attached my
cib file and logs from the passive when cable is disconnected.
I've no problem both nodes running the resource as active is anyway
cut-off from network and can't do any damage. The standby log seems to
say it has quorum but it makes me wonder why it doesnt start the
resources , inspite of the following evident from the logs.
 
1. Standby marks active unclean 
2. Standby has quorum
3. Standby tries to move resources back to standby
 
 
Thanks in advance,
Abhi.




The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should 
check this email and any attachments for the presence of viruses. The company 
accepts no liability for any damage caused by any virus transmitted by this 
email.
 
www.wipro.com

heartbeat[31491]: 2007/11/25_17:58:34 WARN: node active: is dead
heartbeat[31491]: 2007/11/25_17:58:34 info: Link active:eth0 dead.
crmd[31505]: 2007/11/25_17:58:34 notice: crmd_ha_status_callback: Status 
update: Node active now has status [dead]
crmd[31505]: 2007/11/25_17:58:34 info: mem_handle_event: Got an event 
OC_EV_MS_NOT_PRIMARY from ccm
crmd[31505]: 2007/11/25_17:58:34 info: mem_handle_event: instance=4, nodes=2, 
new=2, lost=0, n_idx=0, new_idx=0, old_idx=4
crmd[31505]: 2007/11/25_17:58:34 info: crmd_ccm_msg_callback: Quorum lost after 
event=NOT PRIMARY (id=4)
cib[31501]: 2007/11/25_17:58:34 info: cib_diff_notify: Local-only Change 
(client:31505, call: 23): 0.577.6836 (ok)
cib[31501]: 2007/11/25_17:58:34 info: mem_handle_event: Got an event 
OC_EV_MS_NOT_PRIMARY from ccm
cib[31501]: 2007/11/25_17:58:34 info: mem_handle_event: instance=4, nodes=2, 
new=2, lost=0, n_idx=0, new_idx=0, old_idx=4
cib[1328]: 2007/11/25_17:58:34 info: write_cib_contents: Wrote version 
0.577.6836 of the CIB to disk (digest: 64a038d849d686fe705d878b7a42eeaa)
ccm[31500]: 2007/11/25_17:58:45 info: Break tie for 2 nodes cluster
cib[31501]: 2007/11/25_17:58:45 info: mem_handle_event: Got an event 
OC_EV_MS_INVALID from ccm
cib[31501]: 2007/11/25_17:58:45 info: mem_handle_event: no mbr_track info
cib[31501]: 2007/11/25_17:58:45 info: mem_handle_event: Got an event 
OC_EV_MS_NEW_MEMBERSHIP from ccm
crmd[31505]: 2007/11/25_17:58:45 info: mem_handle_event: Got an event 
OC_EV_MS_INVALID from ccm
cib[31501]: 2007/11/25_17:58:45 info: mem_handle_event: instance=5, nodes=1, 
new=0, lost=1, n_idx=0, new_idx=1, old_idx=3
crmd[31505]: 2007/11/25_17:58:45 info: mem_handle_event: no mbr_track info
cib[31501]: 2007/11/25_17:58:45 info: cib_ccm_msg_callback: LOST: active
crmd[31505]: 2007/11/25_17:58:45 info: mem_handle_event: Got an event 
OC_EV_MS_NEW_MEMBERSHIP from ccm
cib[31501]: 2007/11/25_17:58:45 info: cib_ccm_msg_callback: PEER: standby
crmd[31505]: 2007/11/25_17:58:45 info: mem_handle_event: instance=5, nodes=1, 
new=0, lost=1, n_idx=0, new_idx=1, old_idx=3
crmd[31505]: 2007/11/25_17:58:45 info: crmd_ccm_msg_callback: Quorum 
(re)attained after event=NEW MEMBERSHIP (id=5)
crmd[31505]: 2007/11/25_17:58:45 WARN: check_dead_member: Our DC node (active) 
left the cluster
crmd[31505]: 2007/11/25_17:58:45 info: ccm_event_detail: NEW MEMBERSHIP: 
trans=5, nodes=1, new=0, lost=1 n_idx=0, new_idx=1, old_idx=3
cib[31501]: 2007/11/25_17:58:45 info: cib_diff_notify: Local-only Change 
(client:31505, call: 24): 0.577.6836 (ok)
crmd[31505]: 2007/11/25_17:58:45 info: ccm_event_detail:        CURRENT: 
standby [nodeid=1, born=5]
cib[1329]: 2007/11/25_17:58:45 info: write_cib_contents: Wrote version 
0.577.6836 of the CIB to disk (digest: 540120e0277a81320578332734f68cd3)
crmd[31505]: 2007/11/25_17:58:45 info: ccm_event_detail:        LOST:    active 
[nodeid=0, born=2]
crmd[31505]: 2007/11/25_17:58:45 info: do_state_transition: standby: State 
transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL 
origin=check_dead_member ]
crmd[31505]: 2007/11/25_17:58:45 info: update_dc: Set DC to <null> (<null>)
crmd[31505]: 2007/11/25_17:58:46 info: do_election_count_vote: Updated voted 
hash for standby to vote
crmd[31505]: 2007/11/25_17:58:46 info: do_election_count_vote: Election ignore: 
our vote (standby)
crmd[31505]: 2007/11/25_17:58:46 info: do_state_transition: standby: State 
transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC 
cause=C_FSA_INTERNAL origin=do_election_check ]
crmd[31505]: 2007/11/25_17:58:46 info: start_subsystem: Starting sub-system 
"tengine"
crmd[31505]: 2007/11/25_17:58:46 info: start_subsystem: Starting sub-system 
"pengine"
tengine[1330]: 2007/11/25_17:58:46 info: G_main_add_SignalHandler: Added signal 
handler for signal 15
crmd[31505]: 2007/11/25_17:58:46 info: do_dc_takeover: Taking over DC status 
for this partition
pengine[1331]: 2007/11/25_17:58:46 info: G_main_add_SignalHandler: Added signal 
handler for signal 15
tengine[1330]: 2007/11/25_17:58:46 info: G_main_add_TriggerHandler: Added 
signal manual handler
crmd[31505]: 2007/11/25_17:58:46 info: update_dc: Set DC to <null> (<null>)
cib[31501]: 2007/11/25_17:58:47 info: cib_process_readwrite: We are now in R/W 
mode
pengine[1331]: 2007/11/25_17:58:47 info: init_start: Starting pengine
crmd[31505]: 2007/11/25_17:58:47 info: do_dc_join_offer_all: join-1: Waiting on 
1 outstanding join acks
cib[31501]: 2007/11/25_17:58:47 info: cib_diff_notify: Update (client: 31505, 
call:27): 0.577.6836 -> 0.577.6837 (ok)
crmd[31505]: 2007/11/25_17:58:47 info: update_dc: Set DC to standby (1.0.7)
cib[1340]: 2007/11/25_17:58:47 info: write_cib_contents: Wrote version 
0.577.6837 of the CIB to disk (digest: f864dec595b7f4aaf46e482a4cbf7d37)
tengine[1330]: 2007/11/25_17:58:47 info: init_start: Registering TE UUID: 
ff3ef554-ce2d-47c2-8dfc-a2c6aab3c452
cib[31501]: 2007/11/25_17:58:47 info: cib_null_callback: Setting 
cib_diff_notify callbacks for tengine: on
tengine[1330]: 2007/11/25_17:58:47 info: set_graph_functions: Setting custom 
graph functions
cib[31501]: 2007/11/25_17:58:47 WARN: G_SIG_dispatch: Dispatch function for 
SIGCHLD was delayed 220 ms (> 100 ms) before being called (GSource: 0x8efc3c0)
tengine[1330]: 2007/11/25_17:58:48 info: unpack_graph: Unpacked transition -1: 
0 actions in 0 synapses
cib[31501]: 2007/11/25_17:58:48 info: G_SIG_dispatch: started at 1788375600 
should have started at 1788375578
tengine[1330]: 2007/11/25_17:58:48 info: init_start: Starting tengine
cib[31501]: 2007/11/25_17:58:48 WARN: G_SIG_dispatch: Dispatch function for 
SIGCHLD took too long to execute: 430 ms (> 10 ms) (GSource: 0x8efc3c0)
crmd[31505]: 2007/11/25_17:58:49 info: do_state_transition: standby: State 
transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED 
cause=C_FSA_INTERNAL origin=check_join_state ]
crmd[31505]: 2007/11/25_17:58:49 info: do_state_transition: All 1 cluster nodes 
responded to the join offer.
crmd[31505]: 2007/11/25_17:58:49 info: update_attrd: Connecting to attrd...
cib[31501]: 2007/11/25_17:58:49 info: sync_our_cib: Syncing CIB to all peers
attrd[31504]: 2007/11/25_17:58:49 info: attrd_local_callback: Sending full 
refresh
cib[31501]: 2007/11/25_17:58:49 info: cib_diff_notify: Update (client: 31505, 
call:30): 0.577.6837 -> 0.577.6838 (ok)
tengine[1330]: 2007/11/25_17:58:49 info: te_update_diff: Processing diff 
(cib_update): 0.577.6837 -> 0.577.6838
cib[31501]: 2007/11/25_17:58:49 info: cib_diff_notify: Update (client: 31505, 
call:31): 0.577.6838 -> 0.578.6839 (ok)
tengine[1330]: 2007/11/25_17:58:49 info: te_update_diff: Processing diff 
(cib_bump): 0.577.6838 -> 0.578.6839
cib[31501]: 2007/11/25_17:58:49 info: cib_diff_notify: Update (client: 31505, 
call:32): 0.578.6839 -> 0.578.6840 (ok)
tengine[1330]: 2007/11/25_17:58:49 info: te_update_diff: Processing diff 
(cib_update): 0.578.6839 -> 0.578.6840
cib[1343]: 2007/11/25_17:58:49 info: write_cib_contents: Wrote version 
0.578.6840 of the CIB to disk (digest: 17d8045fa46813f1d3873f5c9c722abc)
crmd[31505]: 2007/11/25_17:58:49 info: update_dc: Set DC to standby (1.0.7)
crmd[31505]: 2007/11/25_17:58:50 info: append_restart_list: Resource Stonith:0 
does not support reloads
crmd[31505]: 2007/11/25_17:58:50 info: append_restart_list: Resource Stonith:1 
does not support reloads
crmd[31505]: 2007/11/25_17:58:50 info: do_dc_join_ack: join-1: Updating node 
state to member for standby)
cib[31501]: 2007/11/25_17:58:50 info: cib_diff_notify: Update (client: 31505, 
call:33): 0.578.6840 -> 0.578.6841 (ok)
tengine[1330]: 2007/11/25_17:58:50 info: te_update_diff: Processing diff 
(cib_update): 0.578.6840 -> 0.578.6841
tengine[1330]: 2007/11/25_17:58:50 info: process_graph_event: Action 
Proxy_10_114_31_238_monitor_0 initiated by a different transitioner
tengine[1330]: 2007/11/25_17:58:50 info: update_abort_priority: Abort priority 
upgraded to 1000000
tengine[1330]: 2007/11/25_17:58:50 info: update_abort_priority: 'DC 
Takeover'-class abort superceeded
crmd[31505]: 2007/11/25_17:58:51 info: do_state_transition: standby: State 
transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED 
cause=C_FSA_INTERNAL origin=check_join_state ]
cib[1344]: 2007/11/25_17:58:51 info: write_cib_contents: Wrote version 
0.578.6841 of the CIB to disk (digest: ed19fc4a413c6b240135b67eacb01bad)
tengine[1330]: 2007/11/25_17:58:51 info: process_graph_event: Action 
Stonith:0_monitor_0 initiated by a different transitioner
crmd[31505]: 2007/11/25_17:58:51 info: do_state_transition: All 1 cluster nodes 
are eligable to run resources.
tengine[1330]: 2007/11/25_17:58:51 info: process_graph_event: Action 
IPaddr_10_114_31_238_monitor_0 initiated by a different transitioner
pengine[1331]: 2007/11/25_17:58:51 info: log_data_element: process_pe_message: 
[generation] <cib admin_epoch="0" have_quorum="true" ignore_dtd="false" 
num_peers="2" cib_feature_revision="1.3" generated="true" ccm_transition="5" 
dc_uuid="cfd38e2f-2e94-4c49-9068-3aead25c9476" epoch="578" num_updates="6841"/>
tengine[1330]: 2007/11/25_17:58:51 info: process_graph_event: Action 
Stonith:1_monitor_0 initiated by a different transitioner
pengine[1331]: 2007/11/25_17:58:51 notice: cluster_option: Using default value 
'true' for cluster option 'symmetric-cluster'
pengine[1331]: 2007/11/25_17:58:52 notice: cluster_option: Using default value 
'reboot' for cluster option 'stonith-action'
pengine[1331]: 2007/11/25_17:58:52 notice: cluster_option: Using default value 
'0' for cluster option 'default-resource-failure-stickiness'
pengine[1331]: 2007/11/25_17:58:52 notice: cluster_option: Using default value 
'true' for cluster option 'is-managed-default'
pengine[1331]: 2007/11/25_17:58:52 notice: cluster_option: Using default value 
'60s' for cluster option 'cluster-delay'
pengine[1331]: 2007/11/25_17:58:52 notice: cluster_option: Using default value 
'20s' for cluster option 'default-action-timeout'
pengine[1331]: 2007/11/25_17:58:52 notice: cluster_option: Using default value 
'true' for cluster option 'stop-orphan-resources'
pengine[1331]: 2007/11/25_17:58:52 notice: cluster_option: Using default value 
'true' for cluster option 'stop-orphan-actions'
pengine[1331]: 2007/11/25_17:58:52 notice: cluster_option: Using default value 
'false' for cluster option 'remove-after-stop'
pengine[1331]: 2007/11/25_17:58:52 notice: cluster_option: Using default value 
'-1' for cluster option 'pe-error-series-max'
pengine[1331]: 2007/11/25_17:58:52 notice: cluster_option: Using default value 
'-1' for cluster option 'pe-warn-series-max'
pengine[1331]: 2007/11/25_17:58:52 notice: cluster_option: Using default value 
'-1' for cluster option 'pe-input-series-max'
pengine[1331]: 2007/11/25_17:58:52 notice: cluster_option: Using default value 
'true' for cluster option 'startup-fencing'
pengine[1331]: 2007/11/25_17:58:52 notice: unpack_config: On loss of CCM 
Quorum: Ignore
pengine[1331]: 2007/11/25_17:58:52 WARN: determine_online_status_fencing: Node 
active (6ef6bc8d-de62-49aa-8ed3-e4fa300cff8c) is un-expectedly down
pengine[1331]: 2007/11/25_17:58:52 info: determine_online_status_fencing:       
ha_state=dead, ccm_state=false, crm_state=online, join_state=down, 
expected=member
pengine[1331]: 2007/11/25_17:58:52 WARN: determine_online_status: Node active 
is unclean
pengine[1331]: 2007/11/25_17:58:52 info: determine_online_status: Node standby 
is online
pengine[1331]: 2007/11/25_17:58:52 info: group_print: Resource Group: proxy_rsc
pengine[1331]: 2007/11/25_17:58:52 info: native_print:     IPaddr_10_114_31_238 
(heartbeat::ocf:myIPAddr):      Started active
pengine[1331]: 2007/11/25_17:58:52 info: native_print:     Proxy_10_114_31_238  
(heartbeat::ocf:myOcf): Started active
pengine[1331]: 2007/11/25_17:58:53 info: clone_print: Clone Set: stonith_rsc
pengine[1331]: 2007/11/25_17:58:53 info: native_print:     Stonith:0    
(stonith:external/rsh): Started standby
pengine[1331]: 2007/11/25_17:58:53 info: native_print:     Stonith:1    
(stonith:external/rsh): Started active
pengine[1331]: 2007/11/25_17:58:53 info: native_color: Combine scores from 
Proxy_10_114_31_238 and IPaddr_10_114_31_238
pengine[1331]: 2007/11/25_17:58:53 notice: NoRoleChange: Move  resource 
IPaddr_10_114_31_238    (active -> standby)
pengine[1331]: 2007/11/25_17:58:53 WARN: custom_action: Action 
IPaddr_10_114_31_238_stop_0 on active is unrunnable (offline)
pengine[1331]: 2007/11/25_17:58:53 WARN: custom_action: Marking node active 
unclean
pengine[1331]: 2007/11/25_17:58:53 notice: Recurring: standby      
IPaddr_10_114_31_238_monitor_5000
pengine[1331]: 2007/11/25_17:58:53 notice: NoRoleChange: Move  resource 
Proxy_10_114_31_238     (active -> standby)
pengine[1331]: 2007/11/25_17:58:53 WARN: custom_action: Action 
Proxy_10_114_31_238_stop_0 on active is unrunnable (offline)
pengine[1331]: 2007/11/25_17:58:53 WARN: custom_action: Marking node active 
unclean
pengine[1331]: 2007/11/25_17:58:53 notice: Recurring: standby      
Proxy_10_114_31_238_monitor_15000
pengine[1331]: 2007/11/25_17:58:53 WARN: native_color: Resource Stonith:1 
cannot run anywhere
pengine[1331]: 2007/11/25_17:58:53 notice: NoRoleChange: Leave resource 
Stonith:0       (standby)
pengine[1331]: 2007/11/25_17:58:53 notice: StopRsc:   active    Stop Stonith:1
pengine[1331]: 2007/11/25_17:58:53 WARN: custom_action: Action Stonith:1_stop_0 
on active is unrunnable (offline)
pengine[1331]: 2007/11/25_17:58:53 WARN: custom_action: Marking node active 
unclean
pengine[1331]: 2007/11/25_17:58:53 WARN: stage6: Scheduling Node active for 
STONITH
pengine[1331]: 2007/11/25_17:58:53 WARN: native_stop_constraints: Stop of 
failed resource IPaddr_10_114_31_238 is implict after active is fenced
pengine[1331]: 2007/11/25_17:58:53 info: native_stop_constraints: Re-creating 
actions for proxy_rsc
pengine[1331]: 2007/11/25_17:58:53 notice: NoRoleChange: Move  resource 
IPaddr_10_114_31_238    (active -> standby)
pengine[1331]: 2007/11/25_17:58:53 WARN: custom_action: Action 
IPaddr_10_114_31_238_stop_0 on active is unrunnable (offline)
pengine[1331]: 2007/11/25_17:58:53 WARN: custom_action: Marking node active 
unclean
pengine[1331]: 2007/11/25_17:58:53 notice: Recurring: standby      
IPaddr_10_114_31_238_monitor_5000
pengine[1331]: 2007/11/25_17:58:53 notice: NoRoleChange: Move  resource 
Proxy_10_114_31_238     (active -> standby)
pengine[1331]: 2007/11/25_17:58:53 WARN: custom_action: Action 
Proxy_10_114_31_238_stop_0 on active is unrunnable (offline)
pengine[1331]: 2007/11/25_17:58:53 WARN: custom_action: Marking node active 
unclean
pengine[1331]: 2007/11/25_17:58:53 notice: Recurring: standby      
Proxy_10_114_31_238_monitor_15000
pengine[1331]: 2007/11/25_17:58:53 WARN: native_stop_constraints: Stop of 
failed resource Proxy_10_114_31_238 is implict after active is fenced
pengine[1331]: 2007/11/25_17:58:53 info: native_stop_constraints: Re-creating 
actions for proxy_rsc
pengine[1331]: 2007/11/25_17:58:53 notice: NoRoleChange: Move  resource 
IPaddr_10_114_31_238    (active -> standby)
pengine[1331]: 2007/11/25_17:58:53 WARN: custom_action: Action 
IPaddr_10_114_31_238_stop_0 on active is unrunnable (offline)
pengine[1331]: 2007/11/25_17:58:54 WARN: custom_action: Marking node active 
unclean
pengine[1331]: 2007/11/25_17:58:54 notice: Recurring: standby      
IPaddr_10_114_31_238_monitor_5000
pengine[1331]: 2007/11/25_17:58:54 notice: NoRoleChange: Move  resource 
Proxy_10_114_31_238     (active -> standby)
pengine[1331]: 2007/11/25_17:58:54 WARN: custom_action: Action 
Proxy_10_114_31_238_stop_0 on active is unrunnable (offline)
pengine[1331]: 2007/11/25_17:58:54 WARN: custom_action: Marking node active 
unclean
pengine[1331]: 2007/11/25_17:58:54 notice: Recurring: standby      
Proxy_10_114_31_238_monitor_15000
pengine[1331]: 2007/11/25_17:58:54 WARN: native_stop_constraints: Stop of 
failed resource Stonith:1 is implict after active is fenced
pengine[1331]: 2007/11/25_17:58:54 info: native_stop_constraints: Re-creating 
actions for stonith_rsc
pengine[1331]: 2007/11/25_17:58:54 notice: NoRoleChange: Leave resource 
Stonith:0       (standby)
pengine[1331]: 2007/11/25_17:58:54 notice: StopRsc:   active    Stop Stonith:1
pengine[1331]: 2007/11/25_17:58:54 WARN: custom_action: Action Stonith:1_stop_0 
on active is unrunnable (offline)
pengine[1331]: 2007/11/25_17:58:54 WARN: custom_action: Marking node active 
unclean
pengine[1331]: 2007/11/25_17:58:54 WARN: crm_mem_stats: Potential memory leak 
detected: 6507 alloc's vs. 6504 free's (3) (32892 bytes not freed: req=16946, 
alloc'd=368884)
pengine[1331]: 2007/11/25_17:58:54 WARN: process_pe_message: Unfree'd memory
pengine[1331]: 2007/11/25_17:58:54 WARN: process_pe_message: Transition 0: 
WARNINGs found during PE processing. PEngine Input stored in: 
/var/lib/heartbeat/pengine/pe-warn-9808.bz2
pengine[1331]: 2007/11/25_17:58:54 info: process_pe_message: Configuration 
WARNINGs found during PE processing.  Please run "crm_verify -L" to identify 
issues.
pengine[1331]: 2007/11/25_17:58:54 info: log_data_element: process_pe_message: 
[generation] <cib admin_epoch="0" have_quorum="true" ignore_dtd="false" 
num_peers="2" cib_feature_revision="1.3" generated="true" ccm_transition="5" 
dc_uuid="cfd38e2f-2e94-4c49-9068-3aead25c9476" epoch="578" num_updates="6841"/>
pengine[1331]: 2007/11/25_17:58:54 notice: cluster_option: Using default value 
'true' for cluster option 'symmetric-cluster'
pengine[1331]: 2007/11/25_17:58:54 notice: cluster_option: Using default value 
'reboot' for cluster option 'stonith-action'
pengine[1331]: 2007/11/25_17:58:54 notice: cluster_option: Using default value 
'0' for cluster option 'default-resource-failure-stickiness'
pengine[1331]: 2007/11/25_17:58:54 notice: cluster_option: Using default value 
'true' for cluster option 'is-managed-default'
pengine[1331]: 2007/11/25_17:58:54 notice: cluster_option: Using default value 
'60s' for cluster option 'cluster-delay'
pengine[1331]: 2007/11/25_17:58:54 notice: cluster_option: Using default value 
'20s' for cluster option 'default-action-timeout'
pengine[1331]: 2007/11/25_17:58:54 notice: cluster_option: Using default value 
'true' for cluster option 'stop-orphan-resources'
pengine[1331]: 2007/11/25_17:58:54 notice: cluster_option: Using default value 
'true' for cluster option 'stop-orphan-actions'
pengine[1331]: 2007/11/25_17:58:54 notice: cluster_option: Using default value 
'false' for cluster option 'remove-after-stop'
pengine[1331]: 2007/11/25_17:58:54 notice: cluster_option: Using default value 
'-1' for cluster option 'pe-error-series-max'
pengine[1331]: 2007/11/25_17:58:54 notice: cluster_option: Using default value 
'-1' for cluster option 'pe-warn-series-max'
pengine[1331]: 2007/11/25_17:58:54 notice: cluster_option: Using default value 
'-1' for cluster option 'pe-input-series-max'
pengine[1331]: 2007/11/25_17:58:54 notice: cluster_option: Using default value 
'true' for cluster option 'startup-fencing'
pengine[1331]: 2007/11/25_17:58:54 notice: unpack_config: On loss of CCM 
Quorum: Ignore
pengine[1331]: 2007/11/25_17:58:55 WARN: determine_online_status_fencing: Node 
active (6ef6bc8d-de62-49aa-8ed3-e4fa300cff8c) is un-expectedly down
pengine[1331]: 2007/11/25_17:58:55 info: determine_online_status_fencing:       
ha_state=dead, ccm_state=false, crm_state=online, join_state=down, 
expected=member
pengine[1331]: 2007/11/25_17:58:55 WARN: determine_online_status: Node active 
is unclean
pengine[1331]: 2007/11/25_17:58:55 info: determine_online_status: Node standby 
is online
pengine[1331]: 2007/11/25_17:58:55 info: group_print: Resource Group: proxy_rsc
pengine[1331]: 2007/11/25_17:58:55 info: native_print:     IPaddr_10_114_31_238 
(heartbeat::ocf:myIPAddr):      Started active
pengine[1331]: 2007/11/25_17:58:55 info: native_print:     Proxy_10_114_31_238  
(heartbeat::ocf:myOcf): Started active
pengine[1331]: 2007/11/25_17:58:55 info: clone_print: Clone Set: stonith_rsc
pengine[1331]: 2007/11/25_17:58:55 info: native_print:     Stonith:0    
(stonith:external/rsh): Started standby
pengine[1331]: 2007/11/25_17:58:55 info: native_print:     Stonith:1    
(stonith:external/rsh): Started active

 <cib admin_epoch="0" have_quorum="true" ignore_dtd="false" num_peers="2" cib_feature_revision="1.3" generated="true" ccm_transition="2" dc_uuid="cfd38e2f-2e94-4c49-9068-3aead25c9476" epoch="579" num_updates="6849" cib-last-written="Sun Nov 25 17:59:17 2007">
   <configuration>
     <crm_config>
       <cluster_property_set id="cib-bootstrap-options">
         <attributes>
           <nvpair id="cib-bootstrap-options-default-resource-stickiness" name="default-resource-stickiness" value="INFINITY"/>
           <nvpair id="cib-bootstrap-options-stonith" name="stonith-enabled" value="TRUE"/>
           <nvpair id="cib-bootstrap-options-quorum" name="no-quorum-policy" value="IGNORE"/>
         </attributes>
       </cluster_property_set>
     </crm_config>
     <nodes>
       <node id="6f6d5305-43e3-46f9-a8d7-7966218bab0b" uname="standby" type="normal"/>
       <node id="bf83d809-e951-4c04-b605-28b5ead6ccfe" uname="active" type="normal"/>
       <node id="6ef6bc8d-de62-49aa-8ed3-e4fa300cff8c" uname="active" type="normal"/>
       <node id="cfd38e2f-2e94-4c49-9068-3aead25c9476" uname="standby" type="normal"/>
     </nodes>
     <resources>
       <group id="proxy_rsc">
         <primitive id="IPaddr_10_114_31_238" class="ocf" type="myIPAddr" provider="heartbeat">
           <operations>
             <op id="a" name="start" timeout="10s"/>
             <op id="b" name="stop" timeout="10s"/>
             <op id="c" name="monitor" timeout="2s" interval="5s"/>
           </operations>
         </primitive>
         <primitive id="Proxy_10_114_31_238" class="ocf" type="myOcf" provider="heartbeat">
           <operations>
             <op id="1" name="start" timeout="10s"/>
             <op id="2" name="stop" timeout="10s"/>
             <op id="3" name="monitor" timeout="5s" interval="15s"/>
           </operations>
         </primitive>
       </group>
       <clone id="stonith_rsc">
         <instance_attributes id="stonith_attr">
           <attributes>
             <nvpair id="stonith_clone_max" name="clone_max" value="2"/>
             <nvpair id="stonith_clone_node_max" name="clone_node_max" value="1"/>
           </attributes>
         </instance_attributes>
         <primitive id="Stonith" class="stonith" type="external/rsh" provider="heartbeat">
           <operations>
             <op id="4" name="start" timeout="10s" prereq="nothing"/>
             <op id="5" name="monitor" timeout="5s" interval="15s"/>
           </operations>
           <instance_attributes id="stonith_rsh_attr">
             <attributes>
               <nvpair id="stonith_rsh_attr1" name="hostlist" value="active,standby"/>
             </attributes>
           </instance_attributes>
         </primitive>
       </clone>
     </resources>
     <constraints>
       <rsc_location id="proxy_rsc_location" rsc="proxy_rsc">
         <rule id="prefered_location_proxy_rsc" score="INFINITY">
           <expression id="prefered_location_proxy_rsc_expr" attribute="#uname" operation="eq" value="active"/>
         </rule>
       </rsc_location>
       <rsc_location id="proxy_stonith_active_rsc_location" rsc="stonith_rsc">
         <rule id="prefered_location_proxy_stonith_active_rsc" score="INFINITY">
           <expression id="prefered_location_proxy_stonith_active_rsc_expr" attribute="#uname" operation="eq" value="standby"/>
         </rule>
       </rsc_location>
       <rsc_location id="proxy_stonith_standby_rsc_location" rsc="stonith_rsc">
         <rule id="prefered_location_proxy_stonith_standby_rsc" score="INFINITY">
           <expression id="prefered_location_proxy_stonith_standby_rsc_expr" attribute="#uname" operation="eq" value="active"/>
         </rule>
       </rsc_location>
     </constraints>
   </configuration>
 </cib>

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Fencing prevents resource from failing over

Reply via email to