Hi,

On 26-Nov-10, at 1:41 AM, Andrew Beekhof wrote:

>> The problem here is that these spurious node failures cause Pacemaker
>> to initiate unnecessary resource migrations.  Is it normal for the
>> cluster to become confused for a while when the network connection to
>> a node is suddenly restored?
>
> Its normal for the CCM (part of heartbeat) and used to be normal for  
> corosync.
> These days I think corosync does a better job in these scenarios.

Does Pacemaker have an option to hold off on running the resource  
reassignments until the node membership as reported by Heartbeat /  
Corosync has stabilized?  Having a rejoining node cause the cluster to  
incorrectly drop good nodes seems like it's a big problem for  
installations that don't use STONITH to keep a once-dead node down.


>>  Or is this happening because using
>> iptables is not a fair test of how the system will respond during a
>> network split?
>
> Unless you've got _really_ quick hands, you're also creating an
> asymmetric network split.
> ie. A can see B but B can't see A.
>
> This would be causing additional confusion at the messaging level.

I retested using ifdown / ifup to be sure I was getting a consistent  
split of the network.  Unfortunately, it still looks like it's a  
problem.

Just in case it's helpful, here's a snippet from the logs showing node  
"test2" returning after a network outage.  You'll see that the node  
"test1" is incorrectly marked as failed for a while, even though the  
nodes test1, test3, and the test1 <-> test3 network connection was up  
at all times.  These logs are from an Ubuntu 10.04 system running the  
default versions of Pacemaker (1.0.8+hg15494-2ubuntu2) and Heartbeat  
(3.0.3-1ubuntu1).

Dec 15 02:42:16 test3 heartbeat: [1109]: CRIT: Cluster node test2  
returning after partition.
Dec 15 02:42:16 test3 heartbeat: [1109]: info: For information on  
cluster partitions, See URL: http://linux-ha.org/wiki/Split_Brain
Dec 15 02:42:16 test3 heartbeat: [1109]: WARN: Deadtime value may be  
too small.
Dec 15 02:42:16 test3 heartbeat: [1109]: info: See FAQ for information  
on tuning deadtime.
Dec 15 02:42:16 test3 heartbeat: [1109]: info: URL: 
http://linux-ha.org/wiki/FAQ#Heavy_Load
Dec 15 02:42:16 test3 heartbeat: [1109]: info: Link test2:eth0 up.
Dec 15 02:42:16 test3 heartbeat: [1109]: WARN: Late heartbeat: Node  
test2: interval 125220 ms
Dec 15 02:42:16 test3 heartbeat: [1109]: info: Status update for node  
test2: status active
Dec 15 02:42:16 test3 cib: [1124]: WARN: cib_peer_callback: Discarding  
cib_apply_diff message (6ba) from test2: not in our membership
Dec 15 02:42:16 test3 crmd: [1128]: notice: crmd_ha_status_callback:  
Status update: Node test2 now has status [active] (DC=true)
Dec 15 02:42:16 test3 crmd: [1128]: info: crm_update_peer_proc:  
test2.ais is now online
Dec 15 02:42:17 test3 ccm: [1123]: debug: quorum plugin: majority
Dec 15 02:42:17 test3 ccm: [1123]: debug: cluster:linux-ha,  
member_count=1, member_quorum_votes=100
Dec 15 02:42:17 test3 ccm: [1123]: debug: total_node_count=3,  
total_quorum_votes=300
Dec 15 02:42:17 test3 cib: [1124]: WARN: cib_peer_callback: Discarding  
cib_apply_diff message (6bc) from test2: not in our membership
Dec 15 02:42:17 test3 cib: [1124]: info: mem_handle_event: Got an  
event OC_EV_MS_INVALID from ccm
Dec 15 02:42:17 test3 cib: [1124]: info: mem_handle_event: no  
mbr_track info
Dec 15 02:42:17 test3 cib: [1124]: info: mem_handle_event: Got an  
event OC_EV_MS_INVALID from ccm
Dec 15 02:42:17 test3 cib: [1124]: info: mem_handle_event:  
instance=33, nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=4
Dec 15 02:42:17 test3 cib: [1124]: info: cib_ccm_msg_callback:  
Processing CCM event=INVALID (id=33)
Dec 15 02:42:17 test3 cib: [1124]: info: crm_update_peer: Node test1:  
id=0 state=lost (new) addr=(null) votes=-1 born=30 seen=32  
proc=00000000000000000000000000000302
Dec 15 02:42:17 test3 crmd: [1128]: info: mem_handle_event: Got an  
event OC_EV_MS_INVALID from ccm
Dec 15 02:42:17 test3 crmd: [1128]: info: mem_handle_event: no  
mbr_track info
Dec 15 02:42:17 test3 crmd: [1128]: info: mem_handle_event: Got an  
event OC_EV_MS_INVALID from ccm
Dec 15 02:42:17 test3 crmd: [1128]: info: mem_handle_event:  
instance=33, nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=4
Dec 15 02:42:17 test3 crmd: [1128]: info: crmd_ccm_msg_callback:  
Quorum lost after event=INVALID (id=33)
Dec 15 02:42:17 test3 crmd: [1128]: info: crm_update_quorum: Updating  
quorum status to false (call=353)
Dec 15 02:42:17 test3 crmd: [1128]: info: ccm_event_detail: INVALID:  
trans=33, nodes=1, new=0, lost=1 n_idx=0, new_idx=1, old_idx=4
Dec 15 02:42:17 test3 crmd: [1128]: info: ccm_event_detail:  
#011CURRENT: test3 [nodeid=2, born=33]
Dec 15 02:42:17 test3 crmd: [1128]: info: ccm_event_detail:  
#011LOST:    test1 [nodeid=0, born=30]
Dec 15 02:42:17 test3 crmd: [1128]: info: ais_status_callback: status:  
test1 is now lost (was member)

---> Notice here that test1 is now marked as down @ 2:42:17

Dec 15 02:42:17 test3 crmd: [1128]: info: crm_update_peer: Node test1:  
id=0 state=lost (new) addr=(null) votes=-1 born=30 seen=32  
proc=00000000000000000000000000000202
Dec 15 02:42:17 test3 crmd: [1128]: info: erase_node_from_join:  
Removed node test1 from join calculations: welcomed=0 itegrated=0  
finalized=0 confirmed=1
Dec 15 02:42:17 test3 crmd: [1128]: info: populate_cib_nodes_ha:  
Requesting the list of configured nodes
Dec 15 02:42:17 test3 cib: [1124]: info: log_data_element: cib:diff: -  
<cib have-quorum="1" admin_epoch="1" epoch="279" num_updates="16" />
Dec 15 02:42:17 test3 cib: [1124]: info: log_data_element: cib:diff: +  
<cib have-quorum="0" admin_epoch="1" epoch="280" num_updates="1" />
Dec 15 02:42:17 test3 cib: [1124]: info: cib_process_request:  
Operation complete: op cib_modify for section cib (origin=local/crmd/ 
353, version=1.280.1): ok (rc=0)
Dec 15 02:42:17 test3 cib: [1239]: info: write_cib_contents: Archived  
previous version as /var/lib/heartbeat/crm/cib-71.raw
Dec 15 02:42:17 test3 cib: [1239]: info: write_cib_contents: Wrote  
version 1.280.0 of the CIB to disk (digest:  
4ceccd803c6229ad6dd3f1b939e3cb06)
Dec 15 02:42:17 test3 cib: [1239]: info: retrieveCib: Reading cluster  
configuration from: /var/lib/heartbeat/crm/cib.MU18pF (digest: /var/ 
lib/heartbeat/crm/cib.4gj2D3)
Dec 15 02:42:19 test3 crmd: [1128]: info: abort_transition_graph:  
need_abort:59 - Triggered transition abort (complete=1) : Non-status  
change
Dec 15 02:42:19 test3 crmd: [1128]: info: need_abort: Aborting on  
change to have-quorum
Dec 15 02:42:19 test3 crmd: [1128]: info: do_state_transition: State  
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC  
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Dec 15 02:42:19 test3 crmd: [1128]: info: do_state_transition: All 1  
cluster nodes are eligible to run resources.
Dec 15 02:42:19 test3 crmd: [1128]: info: do_pe_invoke: Query 356:  
Requesting the current CIB: S_POLICY_ENGINE
Dec 15 02:42:19 test3 cib: [1124]: info: cib_process_request:  
Operation complete: op cib_modify for section nodes (origin=local/crmd/ 
354, version=1.280.1): ok (rc=0)
Dec 15 02:42:19 test3 crmd: [1128]: WARN: match_down_event: No match  
for shutdown action on b3a4cc48-7707-4e47-aa10-3cd34230cebc
Dec 15 02:42:19 test3 crmd: [1128]: info: te_update_diff: Stonith/ 
shutdown of b3a4cc48-7707-4e47-aa10-3cd34230cebc not matched
Dec 15 02:42:19 test3 crmd: [1128]: info: abort_transition_graph:  
te_update_diff:191 - Triggered transition abort (complete=1,  
tag=node_state, id=b3a4cc48-7707-4e47-aa10-3cd34230cebc, magic=NA,  
cib=1.280.2) : Node failure
Dec 15 02:42:19 test3 crmd: [1128]: info: do_pe_invoke: Query 357:  
Requesting the current CIB: S_POLICY_ENGINE
Dec 15 02:42:19 test3 crmd: [1128]: info: do_pe_invoke_callback:  
Invoking the PE: query=357, ref=pe_calc-dc-1292380939-203, seq=33,  
quorate=0
Dec 15 02:42:19 test3 pengine: [1137]: info: unpack_config: Node  
scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Dec 15 02:42:19 test3 pengine: [1137]: WARN: cluster_status: We do not  
have quorum - fencing and resource management disabled
Dec 15 02:42:19 test3 pengine: [1137]: ERROR: unpack_resources:  
Resource start-up disabled since no STONITH resources have been defined
Dec 15 02:42:19 test3 pengine: [1137]: ERROR: unpack_resources: Either  
configure some or disable STONITH with the stonith-enabled option
Dec 15 02:42:19 test3 pengine: [1137]: ERROR: unpack_resources: NOTE:  
Clusters with shared data need STONITH to ensure data integrity
Dec 15 02:42:19 test3 pengine: [1137]: info: determine_online_status:  
Node test3 is online
Dec 15 02:42:19 test3 pengine: [1137]: WARN: pe_fence_node: Node test2  
will be fenced because it is un-expectedly down
Dec 15 02:42:19 test3 pengine: [1137]: info:  
determine_online_status_fencing: #011ha_state=active, ccm_state=false,  
crm_state=online, join_state=pending, expected=member
Dec 15 02:42:19 test3 pengine: [1137]: WARN: determine_online_status:  
Node test2 is unclean
Dec 15 02:42:19 test3 pengine: [1137]: WARN: pe_fence_node: Node test1  
will be fenced because it is un-expectedly down
Dec 15 02:42:19 test3 pengine: [1137]: info:  
determine_online_status_fencing: #011ha_state=active, ccm_state=false,  
crm_state=online, join_state=member, expected=member
Dec 15 02:42:19 test3 pengine: [1137]: WARN: determine_online_status:  
Node test1 is unclean
Dec 15 02:42:19 test3 pengine: [1137]: WARN: stage6: Node test2 is  
unclean!
Dec 15 02:42:19 test3 pengine: [1137]: WARN: stage6: Node test1 is  
unclean!
Dec 15 02:42:19 test3 pengine: [1137]: notice: stage6: Cannot fence  
unclean nodes until quorum is attained (or no-quorum-policy is set to  
ignore)
Dec 15 02:42:19 test3 crmd: [1128]: info: do_state_transition: State  
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS  
cause=C_IPC_MESSAGE origin=handle_response ]
Dec 15 02:42:19 test3 crmd: [1128]: info: unpack_graph: Unpacked  
transition 22: 0 actions in 0 synapses
Dec 15 02:42:19 test3 crmd: [1128]: info: do_te_invoke: Processing  
graph 22 (ref=pe_calc-dc-1292380939-203) derived from /var/lib/pengine/ 
pe-warn-499.bz2
Dec 15 02:42:19 test3 crmd: [1128]: info: run_graph:  
====================================================
Dec 15 02:42:19 test3 crmd: [1128]: notice: run_graph: Transition 22  
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/ 
lib/pengine/pe-warn-499.bz2): Complete
Dec 15 02:42:19 test3 crmd: [1128]: info: te_graph_trigger: Transition  
22 is now complete
Dec 15 02:42:19 test3 crmd: [1128]: info: notify_crmd: Transition 22  
status: done - <null>
Dec 15 02:42:19 test3 crmd: [1128]: info: do_state_transition: State  
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS  
cause=C_FSA_INTERNAL origin=notify_crmd ]
Dec 15 02:42:19 test3 crmd: [1128]: info: do_state_transition:  
Starting PEngine Recheck Timer
Dec 15 02:42:20 test3 pengine: [1137]: WARN: process_pe_message:  
Transition 22: WARNINGs found during PE processing. PEngine Input  
stored in: /var/lib/pengine/pe-warn-499.bz2
Dec 15 02:42:20 test3 pengine: [1137]: info: process_pe_message:  
Configuration ERRORs found during PE processing.  Please run  
"crm_verify -L" to identify issues.
Dec 15 02:42:20 test3 ccm: [1123]: debug: quorum plugin: majority
Dec 15 02:42:20 test3 ccm: [1123]: debug: cluster:linux-ha,  
member_count=2, member_quorum_votes=200
Dec 15 02:42:20 test3 ccm: [1123]: debug: total_node_count=3,  
total_quorum_votes=300
Dec 15 02:42:20 test3 crmd: [1128]: info: mem_handle_event: Got an  
event OC_EV_MS_INVALID from ccm
Dec 15 02:42:20 test3 crmd: [1128]: info: mem_handle_event: no  
mbr_track info
Dec 15 02:42:20 test3 crmd: [1128]: info: mem_handle_event: Got an  
event OC_EV_MS_NEW_MEMBERSHIP from ccm
Dec 15 02:42:20 test3 crmd: [1128]: info: mem_handle_event:  
instance=34, nodes=2, new=1, lost=0, n_idx=0, new_idx=2, old_idx=5
Dec 15 02:42:20 test3 crmd: [1128]: info: crmd_ccm_msg_callback:  
Quorum (re)attained after event=NEW MEMBERSHIP (id=34)
Dec 15 02:42:20 test3 crmd: [1128]: info: crm_update_quorum: Updating  
quorum status to true (call=358)
Dec 15 02:42:20 test3 crmd: [1128]: info: ccm_event_detail: NEW  
MEMBERSHIP: trans=34, nodes=2, new=1, lost=0 n_idx=0, new_idx=2,  
old_idx=5
Dec 15 02:42:20 test3 crmd: [1128]: info: ccm_event_detail:  
#011CURRENT: test3 [nodeid=2, born=1]
Dec 15 02:42:20 test3 crmd: [1128]: info: ccm_event_detail:  
#011CURRENT: test1 [nodeid=0, born=34]
Dec 15 02:42:20 test3 crmd: [1128]: info: ccm_event_detail:  
#011NEW:     test1 [nodeid=0, born=34]
Dec 15 02:42:20 test3 crmd: [1128]: info: ais_status_callback: status:  
test1 is now member (was lost)

--> test1 is now back at 2:42:20



-- Andrew
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to