Hi, On Tue, Mar 08, 2011 at 05:32:44PM +0100, Sascha Hagedorn wrote: > Hi Dejan, > > thank you for your answer. I added an external/ssh stonith resource to test > this and it resolved the problem. It wasn't clear to me that the stonith > resource does more than shooting the other node. Apparently some cluster > parameters are being set too, so the system stays clean. During the test my > understanding was when I cut the power of one node I don't need a stonith > device to shoot it.
Hmm, I wonder how external/ssh could've solved this particular issue, since if you pull the plug it will never be able to fence that node. You really need a usable stonith device. external/ssh is for testing only. Thanks, Dejan > > Thanks again, > Sascha > > -----Ursprüngliche Nachricht----- > Von: [email protected] > [mailto:[email protected]] Im Auftrag von Dejan Muhamedagic > Gesendet: Montag, 7. März 2011 16:43 > An: General Linux-HA mailing list > Betreff: Re: [Linux-HA] Server becomes unresponsive after node failure > > Hi, > > On Mon, Mar 07, 2011 at 10:55:01AM +0100, Sascha Hagedorn wrote: > > Hello everyone, > > > > I am evaluating a two node cluster setup and I am running into some > > problems. The cluster runs a dual master DRBD disk with a OCFS2 filesystem. > > Here are the used software versions: > > > > > > - SLES11 + HAE Extension > > SLE11 is not supported anymore, you'd need to upgrade to SLE11SP1. > > > - DRBD 8.3.7 > > > > - OCFS2 1.4.2 > > > > - libdlm 3.00.01 > > > > - cluster-glue 1.0.5 > > > > - Pacemaker 1.1.2 > > > > - OpenAIS 1.1.2 > > > > The problem occurs when the second node is being powered off instantly by > > pulling the power cable. Shortly after that the load average on the > > surviving system goes up at a very high rate, with no CPU utilization until > > the server becomes unresponsive. Processes I see in the top list very > > frequently are cib, dlm_controld, corosync and ha_logd. Access to the DRBD > > partition is not possible, although the crm_mon shows it is being mounted > > and all services are running. An "ls" on the DRBD OCFS2 partition results > > in a hanging prompt (So does "df" or any other command accessing the > > partition). > > You created a split-brain condition, but have no stonith > resources (and stonith is disabled). That won't work. > > Thanks, > > Dejan > > > > > crm_mon after the power is cut on cluster-node2: > > > > ============ > > Last updated: Mon Mar 7 10:32:10 2011 > > Stack: openais > > Current DC: cluster-node1 - partition WITHOUT quorum > > Version: 1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5 > > 2 Nodes configured, 2 expected votes > > 4 Resources configured. > > ============ > > > > Online: [ cluster-node1 ] > > OFFLINE: [ cluster-node2 ] > > > > Master/Slave Set: ms_drbd > > Masters: [ cluster-node1 ] > > Stopped: [ p_drbd:1 ] > > Clone Set: cl_dlm > > Started: [ cluster-node1 ] > > Stopped: [ p_dlm:1 ] > > Clone Set: cl_o2cb > > Started: [ cluster-node1 ] > > Stopped: [ p_o2cb:1 ] > > Clone Set: cl_fs > > Started: [ cluster-node1 ] > > Stopped: [ p_fs:1 ] > > > > The configuration is as follows: > > > > node cluster-node1 > > node cluster-node2 > > primitive p_dlm ocf:pacemaker:controld \ > > op monitor interval="120s" > > primitive p_drbd ocf:linbit:drbd \ > > params drbd_resource="r0" \ > > operations $id="p_drbd-operations" \ > > op monitor interval="20" role="Master" timeout="20" \ > > op monitor interval="30" role="Slave" timeout="20" > > primitive p_fs ocf:heartbeat:Filesystem \ > > params device="/dev/drbd0" directory="/data" fstype="ocfs2" \ > > op monitor interval="120s" > > primitive p_o2cb ocf:ocfs2:o2cb \ > > op monitor interval="120s" > > ms ms_drbd p_drbd \ > > meta resource-stickines="100" notify="true" master-max="2" > > interleave="true" > > clone cl_dlm p_dlm \ > > meta globally-unique="false" interleave="true" > > clone cl_fs p_fs \ > > meta interleave="true" ordered="true" > > clone cl_o2cb p_o2cb \ > > meta globally-unique="false" interleave="true" > > colocation co_dlm-drbd inf: cl_dlm ms_drbd:Master > > colocation co_fs-o2cb inf: cl_fs cl_o2cb > > colocation co_o2cb-dlm inf: cl_o2cb cl_dlm > > order o_dlm-o2cb 0: cl_dlm cl_o2cb > > order o_drbd-dlm 0: ms_drbd:promote cl_dlm > > order o_o2cb-fs 0: cl_o2cb cl_fs > > property $id="cib-bootstrap-options" \ > > dc-version="1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5" \ > > cluster-infrastructure="openais" \ > > expected-quorum-votes="2" \ > > stonith-enabled="false" \ > > no-quorum-policy="ignore" > > > > Here is a snippet from /var/log/messages (power cut at 10:32:02): > > > > Mar 7 10:32:03 cluster-node1 kernel: [ 4714.838629] r8169: eth0: link down > > Mar 7 10:32:06 cluster-node1 corosync[4300]: [TOTEM ] A processor > > failed, forming new configuration. > > Mar 7 10:32:06 cluster-node1 kernel: [ 4717.748011] block drbd0: PingAck > > did not arrive in time. > > Mar 7 10:32:06 cluster-node1 kernel: [ 4717.748020] block drbd0: peer( > > Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> > > DUnknown ) > > Mar 7 10:32:06 cluster-node1 kernel: [ 4717.748031] block drbd0: asender > > terminated > > Mar 7 10:32:06 cluster-node1 kernel: [ 4717.748035] block drbd0: short > > read expecting header on sock: r=-512 > > Mar 7 10:32:06 cluster-node1 kernel: [ 4717.748037] block drbd0: > > Terminating asender thread > > Mar 7 10:32:06 cluster-node1 kernel: [ 4717.748068] block drbd0: Creating > > new current UUID > > Mar 7 10:32:06 cluster-node1 kernel: [ 4717.763424] block drbd0: > > Connection closed > > Mar 7 10:32:06 cluster-node1 kernel: [ 4717.763429] block drbd0: conn( > > NetworkFailure -> Unconnected ) > > Mar 7 10:32:06 cluster-node1 kernel: [ 4717.763434] block drbd0: receiver > > terminated > > Mar 7 10:32:06 cluster-node1 kernel: [ 4717.763436] block drbd0: > > Restarting receiver thread > > Mar 7 10:32:06 cluster-node1 kernel: [ 4717.763439] block drbd0: receiver > > (re)started > > Mar 7 10:32:06 cluster-node1 kernel: [ 4717.763443] block drbd0: conn( > > Unconnected -> WFConnection ) > > Mar 7 10:32:10 cluster-node1 corosync[4300]: [CLM ] CLM CONFIGURATION > > CHANGE > > Mar 7 10:32:10 cluster-node1 corosync[4300]: [CLM ] New Configuration: > > Mar 7 10:32:10 cluster-node1 corosync[4300]: [CLM ] r(0) > > ip(10.140.1.1) > > Mar 7 10:32:10 cluster-node1 corosync[4300]: [CLM ] Members Left: > > Mar 7 10:32:10 cluster-node1 corosync[4300]: [CLM ] r(0) > > ip(10.140.1.2) > > Mar 7 10:32:10 cluster-node1 corosync[4300]: [CLM ] Members Joined: > > Mar 7 10:32:10 cluster-node1 corosync[4300]: [pcmk ] notice: > > pcmk_peer_update: Transitional membership event on ring 11840: memb=1, > > new=0, lost=1 > > Mar 7 10:32:10 cluster-node1 corosync[4300]: [pcmk ] info: > > pcmk_peer_update: memb: cluster-node1 1 > > Mar 7 10:32:10 cluster-node1 corosync[4300]: [pcmk ] info: > > pcmk_peer_update: lost: cluster-node2 2 > > Mar 7 10:32:10 cluster-node1 corosync[4300]: [CLM ] CLM CONFIGURATION > > CHANGE > > Mar 7 10:32:10 cluster-node1 corosync[4300]: [CLM ] New Configuration: > > Mar 7 10:32:10 cluster-node1 corosync[4300]: [CLM ] r(0) > > ip(10.140.1.1) > > Mar 7 10:32:10 cluster-node1 cib: [4309]: notice: ais_dispatch: Membership > > 11840: quorum lost > > Mar 7 10:32:10 cluster-node1 crmd: [4313]: notice: ais_dispatch: > > Membership 11840: quorum lost > > Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: update_cluster: Processing > > membership 11840 > > Mar 7 10:32:10 cluster-node1 corosync[4300]: [CLM ] Members Left: > > Mar 7 10:32:10 cluster-node1 cib: [4309]: info: crm_update_peer: Node > > cluster-node2: id=2 state=lost (new) addr=r(0) ip(10.140.1.2) votes=1 > > born=11836 seen=11836 proc=00000000000000000000000000151312 > > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: ais_status_callback: > > status: cluster-node2 is now lost (was member) > > Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: dlm_process_node: Skipped > > active node 1: born-on=11780, last-seen=11840, this-event=11840, > > last-event=11836 > > Mar 7 10:32:10 cluster-node1 corosync[4300]: [CLM ] Members Joined: > > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: crm_update_peer: Node > > cluster-node2: id=2 state=lost (new) addr=r(0) ip(10.140.1.2) votes=1 > > born=11836 seen=11836 proc=00000000000000000000000000151312 > > Mar 7 10:32:10 cluster-node1 ocfs2_controld[5544]: confchg called > > Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: del_configfs_node: > > del_configfs_node rmdir "/sys/kernel/config/dlm/cluster/comms/2" > > Mar 7 10:32:10 cluster-node1 corosync[4300]: [pcmk ] notice: > > pcmk_peer_update: Stable membership event on ring 11840: memb=1, new=0, > > lost=0 > > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: erase_node_from_join: > > Removed node cluster-node2 from join calculations: welcomed=0 itegrated=0 > > finalized=0 confirmed=1 > > Mar 7 10:32:10 cluster-node1 ocfs2_controld[5544]: ocfs2_controld (group > > "ocfs2:controld") confchg: members 1, left 1, joined 0 > > Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: dlm_process_node: Removed > > inactive node 2: born-on=11836, last-seen=11836, this-event=11840, > > last-event=11836 > > Mar 7 10:32:10 cluster-node1 corosync[4300]: [pcmk ] info: > > pcmk_peer_update: MEMB: cluster-node1 1 > > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: crm_update_quorum: > > Updating quorum status to false (call=634) > > Mar 7 10:32:10 cluster-node1 ocfs2_controld[5544]: node daemon left 2 > > Mar 7 10:32:10 cluster-node1 ocfs2_controld[5544]: node down 2 > > Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: log_config: dlm:controld > > conf 1 0 1 memb 1 join left 2 > > Mar 7 10:32:10 cluster-node1 corosync[4300]: [pcmk ] info: > > ais_mark_unseen_peer_dead: Node cluster-node2 was not seen in the previous > > transition > > Mar 7 10:32:10 cluster-node1 ocfs2_controld[5544]: Node 2 has left > > mountgroup 17633D496670435F99A9C3A12F3FFFF0 > > Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: log_config: > > dlm:ls:17633D496670435F99A9C3A12F3FFFF0 conf 1 0 1 memb 1 join left 2 > > Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: node_history_fail: > > 17633D496670435F99A9C3A12F3FFFF0 check_fs nodeid 2 set > > Mar 7 10:32:10 cluster-node1 corosync[4300]: [pcmk ] info: > > update_member: Node 2/cluster-node2 is now: lost > > Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: add_change: > > 17633D496670435F99A9C3A12F3FFFF0 add_change cg 19 remove nodeid 2 reason 3 > > Mar 7 10:32:10 cluster-node1 corosync[4300]: [pcmk ] info: > > send_member_notification: Sending membership update 11840 to 4 children > > Mar 7 10:32:10 cluster-node1 corosync[4300]: [TOTEM ] A processor joined > > or left the membership and a new membership was formed. > > Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: add_change: > > 17633D496670435F99A9C3A12F3FFFF0 add_change cg 19 counts member 1 joined 0 > > remove 1 failed 1 > > Mar 7 10:32:10 cluster-node1 corosync[4300]: [MAIN ] Completed service > > synchronization, ready to provide service. > > Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: stop_kernel: > > 17633D496670435F99A9C3A12F3FFFF0 stop_kernel cg 19 > > Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: do_sysfs: write "0" to > > "/sys/kernel/dlm/17633D496670435F99A9C3A12F3FFFF0/control" > > Mar 7 10:32:10 cluster-node1 kernel: [ 4721.450691] dlm: closing > > connection to node 2 > > Mar 7 10:32:10 cluster-node1 ocfs2_controld[5544]: Sending notification of > > node 2 for "17633D496670435F99A9C3A12F3FFFF0" > > Mar 7 10:32:10 cluster-node1 ocfs2_controld[5544]: confchg called > > Mar 7 10:32:10 cluster-node1 ocfs2_controld[5544]: group > > "ocfs2:17633D496670435F99A9C3A12F3FFFF0" confchg: members 1, left 1, joined > > 0 > > Mar 7 10:32:10 cluster-node1 cib: [4309]: info: cib_process_request: > > Operation complete: op cib_modify for section nodes (origin=local/crmd/632, > > version=0.30.30): ok (rc=0) > > Mar 7 10:32:10 cluster-node1 cib: [4309]: info: log_data_element: > > cib:diff: - <cib have-quorum="1" admin_epoch="0" epoch="30" > > num_updates="31" /> > > Mar 7 10:32:10 cluster-node1 cib: [4309]: info: log_data_element: > > cib:diff: + <cib have-quorum="0" admin_epoch="0" epoch="31" num_updates="1" > > /> > > Mar 7 10:32:10 cluster-node1 cib: [4309]: info: cib_process_request: > > Operation complete: op cib_modify for section cib (origin=local/crmd/634, > > version=0.31.1): ok (rc=0) > > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: crm_ais_dispatch: Setting > > expected votes to 2 > > Mar 7 10:32:10 cluster-node1 crmd: [4313]: WARN: match_down_event: No > > match for shutdown action on cluster-node2 > > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: te_update_diff: > > Stonith/shutdown of cluster-node2 not matched > > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: abort_transition_graph: > > te_update_diff:194 - Triggered transition abort (complete=1, > > tag=node_state, id=cluster-node2, magic=NA, cib=0.30.31) : Node failure > > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: abort_transition_graph: > > need_abort:59 - Triggered transition abort (complete=1) : Non-status change > > Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: fence_node_time: Node > > 2/cluster-node2 has not been shot yet > > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: need_abort: Aborting on > > change to have-quorum > > Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: check_fencing_done: > > 17633D496670435F99A9C3A12F3FFFF0 check_fencing 2 not fenced add 1299490145 > > fence 0 > > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: do_state_transition: > > State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC > > cause=C_FSA_INTERNAL origin=abort_transition_graph ] > > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: do_state_transition: All > > 1 cluster nodes are eligible to run resources. > > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: do_pe_invoke: Query 637: > > Requesting the current CIB: S_POLICY_ENGINE > > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: do_pe_invoke: Query 638: > > Requesting the current CIB: S_POLICY_ENGINE > > Mar 7 10:32:10 cluster-node1 cib: [4309]: info: cib_process_request: > > Operation complete: op cib_modify for section crm_config > > (origin=local/crmd/636, version=0.31.1): ok (rc=0) > > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: do_pe_invoke_callback: > > Invoking the PE: query=638, ref=pe_calc-dc-1299490330-545, seq=11840, > > quorate=0 > > Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: set_fs_notified: > > 17633D496670435F99A9C3A12F3FFFF0 set_fs_notified nodeid 2 > > Mar 7 10:32:10 cluster-node1 ocfs2_controld[5544]: message from dlmcontrol > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: unpack_config: Startup > > probes: enabled > > Mar 7 10:32:10 cluster-node1 ocfs2_controld[5544]: Notified for > > "17633D496670435F99A9C3A12F3FFFF0", node 2, status 0 > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: unpack_config: On > > loss of CCM Quorum: Ignore > > Mar 7 10:32:10 cluster-node1 ocfs2_controld[5544]: Completing notification > > on "17633D496670435F99A9C3A12F3FFFF0" for node 2 > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: unpack_config: Node > > scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: unpack_domains: > > Unpacking domains > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: > > determine_online_status: Node cluster-node1 is online > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: clone_print: > > Master/Slave Set: ms_drbd > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: short_print: > > Masters: [ cluster-node1 ] > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: short_print: > > Stopped: [ p_drbd:1 ] > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: clone_print: Clone > > Set: cl_dlm > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: short_print: > > Started: [ cluster-node1 ] > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: short_print: > > Stopped: [ p_dlm:1 ] > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: clone_print: Clone > > Set: cl_o2cb > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: short_print: > > Started: [ cluster-node1 ] > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: short_print: > > Stopped: [ p_o2cb:1 ] > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: clone_print: Clone > > Set: cl_fs > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: short_print: > > Started: [ cluster-node1 ] > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: short_print: > > Stopped: [ p_fs:1 ] > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: native_color: Resource > > p_drbd:1 cannot run anywhere > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: master_color: > > Promoting p_drbd:0 (Master cluster-node1) > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: master_color: ms_drbd: > > Promoted 1 instances of a possible 2 to master > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: master_color: > > Promoting p_drbd:0 (Master cluster-node1) > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: master_color: ms_drbd: > > Promoted 1 instances of a possible 2 to master > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: master_color: > > Promoting p_drbd:0 (Master cluster-node1) > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: master_color: ms_drbd: > > Promoted 1 instances of a possible 2 to master > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: native_color: Resource > > p_dlm:1 cannot run anywhere > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: native_color: Resource > > p_o2cb:1 cannot run anywhere > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: native_color: Resource > > p_fs:1 cannot run anywhere > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: find_compatible_child: > > Colocating p_dlm:0 with p_drbd:0 on cluster-node1 > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh: > > Interleaving p_drbd:0 with p_dlm:0 > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: find_compatible_child: > > Colocating p_drbd:0 with p_dlm:0 on cluster-node1 > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh: > > Interleaving p_dlm:0 with p_drbd:0 > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: find_compatible_child: > > Colocating p_o2cb:0 with p_dlm:0 on cluster-node1 > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh: > > Interleaving p_dlm:0 with p_o2cb:0 > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: find_compatible_child: > > Colocating p_dlm:0 with p_o2cb:0 on cluster-node1 > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh: > > Interleaving p_o2cb:0 with p_dlm:0 > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: find_compatible_child: > > Colocating p_fs:0 with p_o2cb:0 on cluster-node1 > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh: > > Interleaving p_o2cb:0 with p_fs:0 > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: find_compatible_child: > > Colocating p_o2cb:0 with p_fs:0 on cluster-node1 > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh: > > Interleaving p_fs:0 with p_o2cb:0 > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave > > resource p_drbd:0 (Master cluster-node1) > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave > > resource p_drbd:1 (Stopped) > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave > > resource p_dlm:0 (Started cluster-node1) > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave > > resource p_dlm:1 (Stopped) > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave > > resource p_o2cb:0 (Started cluster-node1) > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave > > resource p_o2cb:1 (Stopped) > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave > > resource p_fs:0 (Started cluster-node1) > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave > > resource p_fs:1 (Stopped) > > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: do_state_transition: > > State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ > > input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] > > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: unpack_graph: Unpacked > > transition 62: 0 actions in 0 synapses > > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: do_te_invoke: Processing > > graph 62 (ref=pe_calc-dc-1299490330-545) derived from > > /var/lib/pengine/pe-input-4730.bz2 > > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: run_graph: > > ==================================================== > > Mar 7 10:32:10 cluster-node1 crmd: [4313]: notice: run_graph: Transition > > 62 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, > > Source=/var/lib/pengine/pe-input-4730.bz2): Complete > > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: te_graph_trigger: > > Transition 62 is now complete > > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: notify_crmd: Transition > > 62 status: done - <null> > > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: do_state_transition: > > State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > > cause=C_FSA_INTERNAL origin=notify_crmd ] > > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: do_state_transition: > > Starting PEngine Recheck Timer > > Mar 7 10:32:10 cluster-node1 cib: [10261]: info: write_cib_contents: > > Archived previous version as /var/lib/heartbeat/crm/cib-23.raw > > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: process_pe_message: > > Transition 62: PEngine Input stored in: /var/lib/pengine/pe-input-4730.bz2 > > Mar 7 10:32:10 cluster-node1 cib: [10261]: info: write_cib_contents: Wrote > > version 0.31.0 of the CIB to disk (digest: 1360ea4c1e6d061a115b8efa6794189a) > > Mar 7 10:32:10 cluster-node1 cib: [10261]: info: retrieveCib: Reading > > cluster configuration from: /var/lib/heartbeat/crm/cib.ljbIv1 (digest: > > /var/lib/heartbeat/crm/cib.HFfG5H) > > Mar 7 10:35:24 cluster-node1 cib: [4309]: info: cib_stats: Processed 660 > > operations (2333.00us average, 0% utilization) in the last 10min > > Mar 7 10:45:24 cluster-node1 cib: [4309]: info: cib_stats: Processed 629 > > operations (651.00us average, 0% utilization) in the last 10min > > Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: crm_timer_popped: PEngine > > Recheck Timer (I_PE_CALC) just popped! > > Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: do_state_transition: > > State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC > > cause=C_TIMER_POPPED origin=crm_timer_popped ] > > Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: do_state_transition: > > Progressed to state S_POLICY_ENGINE after C_TIMER_POPPED > > Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: do_state_transition: All > > 1 cluster nodes are eligible to run resources. > > Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: do_pe_invoke: Query 639: > > Requesting the current CIB: S_POLICY_ENGINE > > Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: do_pe_invoke_callback: > > Invoking the PE: query=639, ref=pe_calc-dc-1299491230-546, seq=11840, > > quorate=0 > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: unpack_config: Startup > > probes: enabled > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: unpack_config: On > > loss of CCM Quorum: Ignore > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: unpack_config: Node > > scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: unpack_domains: > > Unpacking domains > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: > > determine_online_status: Node cluster-node1 is online > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: clone_print: > > Master/Slave Set: ms_drbd > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: short_print: > > Masters: [ cluster-node1 ] > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: short_print: > > Stopped: [ p_drbd:1 ] > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: clone_print: Clone > > Set: cl_dlm > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: short_print: > > Started: [ cluster-node1 ] > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: short_print: > > Stopped: [ p_dlm:1 ] > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: clone_print: Clone > > Set: cl_o2cb > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: short_print: > > Started: [ cluster-node1 ] > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: short_print: > > Stopped: [ p_o2cb:1 ] > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: clone_print: Clone > > Set: cl_fs > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: short_print: > > Started: [ cluster-node1 ] > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: short_print: > > Stopped: [ p_fs:1 ] > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: native_color: Resource > > p_drbd:1 cannot run anywhere > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: master_color: > > Promoting p_drbd:0 (Master cluster-node1) > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: master_color: ms_drbd: > > Promoted 1 instances of a possible 2 to master > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: master_color: > > Promoting p_drbd:0 (Master cluster-node1) > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: master_color: ms_drbd: > > Promoted 1 instances of a possible 2 to master > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: master_color: > > Promoting p_drbd:0 (Master cluster-node1) > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: master_color: ms_drbd: > > Promoted 1 instances of a possible 2 to master > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: native_color: Resource > > p_dlm:1 cannot run anywhere > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: native_color: Resource > > p_o2cb:1 cannot run anywhere > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: native_color: Resource > > p_fs:1 cannot run anywhere > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: find_compatible_child: > > Colocating p_dlm:0 with p_drbd:0 on cluster-node1 > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh: > > Interleaving p_drbd:0 with p_dlm:0 > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: find_compatible_child: > > Colocating p_drbd:0 with p_dlm:0 on cluster-node1 > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh: > > Interleaving p_dlm:0 with p_drbd:0 > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: find_compatible_child: > > Colocating p_o2cb:0 with p_dlm:0 on cluster-node1 > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh: > > Interleaving p_dlm:0 with p_o2cb:0 > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: find_compatible_child: > > Colocating p_dlm:0 with p_o2cb:0 on cluster-node1 > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh: > > Interleaving p_o2cb:0 with p_dlm:0 > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: find_compatible_child: > > Colocating p_fs:0 with p_o2cb:0 on cluster-node1 > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh: > > Interleaving p_o2cb:0 with p_fs:0 > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: find_compatible_child: > > Colocating p_o2cb:0 with p_fs:0 on cluster-node1 > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh: > > Interleaving p_fs:0 with p_o2cb:0 > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave > > resource p_drbd:0 (Master cluster-node1) > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave > > resource p_drbd:1 (Stopped) > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave > > resource p_dlm:0 (Started cluster-node1) > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave > > resource p_dlm:1 (Stopped) > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave > > resource p_o2cb:0 (Started cluster-node1) > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave > > resource p_o2cb:1 (Stopped) > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave > > resource p_fs:0 (Started cluster-node1) > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave > > resource p_fs:1 (Stopped) > > Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: do_state_transition: > > State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ > > input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] > > Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: unpack_graph: Unpacked > > transition 63: 0 actions in 0 synapses > > Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: do_te_invoke: Processing > > graph 63 (ref=pe_calc-dc-1299491230-546) derived from > > /var/lib/pengine/pe-input-4731.bz2 > > Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: run_graph: > > ==================================================== > > Mar 7 10:47:10 cluster-node1 crmd: [4313]: notice: run_graph: Transition > > 63 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, > > Source=/var/lib/pengine/pe-input-4731.bz2): Complete > > Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: te_graph_trigger: > > Transition 63 is now complete > > Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: notify_crmd: Transition > > 63 status: done - <null> > > Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: do_state_transition: > > State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > > cause=C_FSA_INTERNAL origin=notify_crmd ] > > Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: do_state_transition: > > Starting PEngine Recheck Timer > > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: process_pe_message: > > Transition 63: PEngine Input stored in: /var/lib/pengine/pe-input-4731.bz2 > > > > Any help is appreciated. > > > > Thank you and kind regards, > > Sascha Hagedorn > > _______________________________________________ > > Linux-HA mailing list > > [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
