Hello everyone,
I am evaluating a two node cluster setup and I am running into some problems.
The cluster runs a dual master DRBD disk with a OCFS2 filesystem. Here are the
used software versions:
- SLES11 + HAE Extension
- DRBD 8.3.7
- OCFS2 1.4.2
- libdlm 3.00.01
- cluster-glue 1.0.5
- Pacemaker 1.1.2
- OpenAIS 1.1.2
The problem occurs when the second node is being powered off instantly by
pulling the power cable. Shortly after that the load average on the surviving
system goes up at a very high rate, with no CPU utilization until the server
becomes unresponsive. Processes I see in the top list very frequently are cib,
dlm_controld, corosync and ha_logd. Access to the DRBD partition is not
possible, although the crm_mon shows it is being mounted and all services are
running. An "ls" on the DRBD OCFS2 partition results in a hanging prompt (So
does "df" or any other command accessing the partition).
crm_mon after the power is cut on cluster-node2:
============
Last updated: Mon Mar 7 10:32:10 2011
Stack: openais
Current DC: cluster-node1 - partition WITHOUT quorum
Version: 1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5
2 Nodes configured, 2 expected votes
4 Resources configured.
============
Online: [ cluster-node1 ]
OFFLINE: [ cluster-node2 ]
Master/Slave Set: ms_drbd
Masters: [ cluster-node1 ]
Stopped: [ p_drbd:1 ]
Clone Set: cl_dlm
Started: [ cluster-node1 ]
Stopped: [ p_dlm:1 ]
Clone Set: cl_o2cb
Started: [ cluster-node1 ]
Stopped: [ p_o2cb:1 ]
Clone Set: cl_fs
Started: [ cluster-node1 ]
Stopped: [ p_fs:1 ]
The configuration is as follows:
node cluster-node1
node cluster-node2
primitive p_dlm ocf:pacemaker:controld \
op monitor interval="120s"
primitive p_drbd ocf:linbit:drbd \
params drbd_resource="r0" \
operations $id="p_drbd-operations" \
op monitor interval="20" role="Master" timeout="20" \
op monitor interval="30" role="Slave" timeout="20"
primitive p_fs ocf:heartbeat:Filesystem \
params device="/dev/drbd0" directory="/data" fstype="ocfs2" \
op monitor interval="120s"
primitive p_o2cb ocf:ocfs2:o2cb \
op monitor interval="120s"
ms ms_drbd p_drbd \
meta resource-stickines="100" notify="true" master-max="2"
interleave="true"
clone cl_dlm p_dlm \
meta globally-unique="false" interleave="true"
clone cl_fs p_fs \
meta interleave="true" ordered="true"
clone cl_o2cb p_o2cb \
meta globally-unique="false" interleave="true"
colocation co_dlm-drbd inf: cl_dlm ms_drbd:Master
colocation co_fs-o2cb inf: cl_fs cl_o2cb
colocation co_o2cb-dlm inf: cl_o2cb cl_dlm
order o_dlm-o2cb 0: cl_dlm cl_o2cb
order o_drbd-dlm 0: ms_drbd:promote cl_dlm
order o_o2cb-fs 0: cl_o2cb cl_fs
property $id="cib-bootstrap-options" \
dc-version="1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore"
Here is a snippet from /var/log/messages (power cut at 10:32:02):
Mar 7 10:32:03 cluster-node1 kernel: [ 4714.838629] r8169: eth0: link down
Mar 7 10:32:06 cluster-node1 corosync[4300]: [TOTEM ] A processor failed,
forming new configuration.
Mar 7 10:32:06 cluster-node1 kernel: [ 4717.748011] block drbd0: PingAck did
not arrive in time.
Mar 7 10:32:06 cluster-node1 kernel: [ 4717.748020] block drbd0: peer( Primary
-> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Mar 7 10:32:06 cluster-node1 kernel: [ 4717.748031] block drbd0: asender
terminated
Mar 7 10:32:06 cluster-node1 kernel: [ 4717.748035] block drbd0: short read
expecting header on sock: r=-512
Mar 7 10:32:06 cluster-node1 kernel: [ 4717.748037] block drbd0: Terminating
asender thread
Mar 7 10:32:06 cluster-node1 kernel: [ 4717.748068] block drbd0: Creating new
current UUID
Mar 7 10:32:06 cluster-node1 kernel: [ 4717.763424] block drbd0: Connection
closed
Mar 7 10:32:06 cluster-node1 kernel: [ 4717.763429] block drbd0: conn(
NetworkFailure -> Unconnected )
Mar 7 10:32:06 cluster-node1 kernel: [ 4717.763434] block drbd0: receiver
terminated
Mar 7 10:32:06 cluster-node1 kernel: [ 4717.763436] block drbd0: Restarting
receiver thread
Mar 7 10:32:06 cluster-node1 kernel: [ 4717.763439] block drbd0: receiver
(re)started
Mar 7 10:32:06 cluster-node1 kernel: [ 4717.763443] block drbd0: conn(
Unconnected -> WFConnection )
Mar 7 10:32:10 cluster-node1 corosync[4300]: [CLM ] CLM CONFIGURATION
CHANGE
Mar 7 10:32:10 cluster-node1 corosync[4300]: [CLM ] New Configuration:
Mar 7 10:32:10 cluster-node1 corosync[4300]: [CLM ] r(0) ip(10.140.1.1)
Mar 7 10:32:10 cluster-node1 corosync[4300]: [CLM ] Members Left:
Mar 7 10:32:10 cluster-node1 corosync[4300]: [CLM ] r(0) ip(10.140.1.2)
Mar 7 10:32:10 cluster-node1 corosync[4300]: [CLM ] Members Joined:
Mar 7 10:32:10 cluster-node1 corosync[4300]: [pcmk ] notice:
pcmk_peer_update: Transitional membership event on ring 11840: memb=1, new=0,
lost=1
Mar 7 10:32:10 cluster-node1 corosync[4300]: [pcmk ] info:
pcmk_peer_update: memb: cluster-node1 1
Mar 7 10:32:10 cluster-node1 corosync[4300]: [pcmk ] info:
pcmk_peer_update: lost: cluster-node2 2
Mar 7 10:32:10 cluster-node1 corosync[4300]: [CLM ] CLM CONFIGURATION
CHANGE
Mar 7 10:32:10 cluster-node1 corosync[4300]: [CLM ] New Configuration:
Mar 7 10:32:10 cluster-node1 corosync[4300]: [CLM ] r(0) ip(10.140.1.1)
Mar 7 10:32:10 cluster-node1 cib: [4309]: notice: ais_dispatch: Membership
11840: quorum lost
Mar 7 10:32:10 cluster-node1 crmd: [4313]: notice: ais_dispatch: Membership
11840: quorum lost
Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: update_cluster: Processing
membership 11840
Mar 7 10:32:10 cluster-node1 corosync[4300]: [CLM ] Members Left:
Mar 7 10:32:10 cluster-node1 cib: [4309]: info: crm_update_peer: Node
cluster-node2: id=2 state=lost (new) addr=r(0) ip(10.140.1.2) votes=1
born=11836 seen=11836 proc=00000000000000000000000000151312
Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: ais_status_callback: status:
cluster-node2 is now lost (was member)
Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: dlm_process_node: Skipped
active node 1: born-on=11780, last-seen=11840, this-event=11840,
last-event=11836
Mar 7 10:32:10 cluster-node1 corosync[4300]: [CLM ] Members Joined:
Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: crm_update_peer: Node
cluster-node2: id=2 state=lost (new) addr=r(0) ip(10.140.1.2) votes=1
born=11836 seen=11836 proc=00000000000000000000000000151312
Mar 7 10:32:10 cluster-node1 ocfs2_controld[5544]: confchg called
Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: del_configfs_node:
del_configfs_node rmdir "/sys/kernel/config/dlm/cluster/comms/2"
Mar 7 10:32:10 cluster-node1 corosync[4300]: [pcmk ] notice:
pcmk_peer_update: Stable membership event on ring 11840: memb=1, new=0, lost=0
Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: erase_node_from_join: Removed
node cluster-node2 from join calculations: welcomed=0 itegrated=0 finalized=0
confirmed=1
Mar 7 10:32:10 cluster-node1 ocfs2_controld[5544]: ocfs2_controld (group
"ocfs2:controld") confchg: members 1, left 1, joined 0
Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: dlm_process_node: Removed
inactive node 2: born-on=11836, last-seen=11836, this-event=11840,
last-event=11836
Mar 7 10:32:10 cluster-node1 corosync[4300]: [pcmk ] info:
pcmk_peer_update: MEMB: cluster-node1 1
Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: crm_update_quorum: Updating
quorum status to false (call=634)
Mar 7 10:32:10 cluster-node1 ocfs2_controld[5544]: node daemon left 2
Mar 7 10:32:10 cluster-node1 ocfs2_controld[5544]: node down 2
Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: log_config: dlm:controld conf
1 0 1 memb 1 join left 2
Mar 7 10:32:10 cluster-node1 corosync[4300]: [pcmk ] info:
ais_mark_unseen_peer_dead: Node cluster-node2 was not seen in the previous
transition
Mar 7 10:32:10 cluster-node1 ocfs2_controld[5544]: Node 2 has left mountgroup
17633D496670435F99A9C3A12F3FFFF0
Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: log_config:
dlm:ls:17633D496670435F99A9C3A12F3FFFF0 conf 1 0 1 memb 1 join left 2
Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: node_history_fail:
17633D496670435F99A9C3A12F3FFFF0 check_fs nodeid 2 set
Mar 7 10:32:10 cluster-node1 corosync[4300]: [pcmk ] info: update_member:
Node 2/cluster-node2 is now: lost
Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: add_change:
17633D496670435F99A9C3A12F3FFFF0 add_change cg 19 remove nodeid 2 reason 3
Mar 7 10:32:10 cluster-node1 corosync[4300]: [pcmk ] info:
send_member_notification: Sending membership update 11840 to 4 children
Mar 7 10:32:10 cluster-node1 corosync[4300]: [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: add_change:
17633D496670435F99A9C3A12F3FFFF0 add_change cg 19 counts member 1 joined 0
remove 1 failed 1
Mar 7 10:32:10 cluster-node1 corosync[4300]: [MAIN ] Completed service
synchronization, ready to provide service.
Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: stop_kernel:
17633D496670435F99A9C3A12F3FFFF0 stop_kernel cg 19
Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: do_sysfs: write "0" to
"/sys/kernel/dlm/17633D496670435F99A9C3A12F3FFFF0/control"
Mar 7 10:32:10 cluster-node1 kernel: [ 4721.450691] dlm: closing connection to
node 2
Mar 7 10:32:10 cluster-node1 ocfs2_controld[5544]: Sending notification of
node 2 for "17633D496670435F99A9C3A12F3FFFF0"
Mar 7 10:32:10 cluster-node1 ocfs2_controld[5544]: confchg called
Mar 7 10:32:10 cluster-node1 ocfs2_controld[5544]: group
"ocfs2:17633D496670435F99A9C3A12F3FFFF0" confchg: members 1, left 1, joined 0
Mar 7 10:32:10 cluster-node1 cib: [4309]: info: cib_process_request: Operation
complete: op cib_modify for section nodes (origin=local/crmd/632,
version=0.30.30): ok (rc=0)
Mar 7 10:32:10 cluster-node1 cib: [4309]: info: log_data_element: cib:diff: -
<cib have-quorum="1" admin_epoch="0" epoch="30" num_updates="31" />
Mar 7 10:32:10 cluster-node1 cib: [4309]: info: log_data_element: cib:diff: +
<cib have-quorum="0" admin_epoch="0" epoch="31" num_updates="1" />
Mar 7 10:32:10 cluster-node1 cib: [4309]: info: cib_process_request: Operation
complete: op cib_modify for section cib (origin=local/crmd/634,
version=0.31.1): ok (rc=0)
Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: crm_ais_dispatch: Setting
expected votes to 2
Mar 7 10:32:10 cluster-node1 crmd: [4313]: WARN: match_down_event: No match
for shutdown action on cluster-node2
Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: te_update_diff:
Stonith/shutdown of cluster-node2 not matched
Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: abort_transition_graph:
te_update_diff:194 - Triggered transition abort (complete=1, tag=node_state,
id=cluster-node2, magic=NA, cib=0.30.31) : Node failure
Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: abort_transition_graph:
need_abort:59 - Triggered transition abort (complete=1) : Non-status change
Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: fence_node_time: Node
2/cluster-node2 has not been shot yet
Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: need_abort: Aborting on
change to have-quorum
Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: check_fencing_done:
17633D496670435F99A9C3A12F3FFFF0 check_fencing 2 not fenced add 1299490145
fence 0
Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
origin=abort_transition_graph ]
Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: do_state_transition: All 1
cluster nodes are eligible to run resources.
Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: do_pe_invoke: Query 637:
Requesting the current CIB: S_POLICY_ENGINE
Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: do_pe_invoke: Query 638:
Requesting the current CIB: S_POLICY_ENGINE
Mar 7 10:32:10 cluster-node1 cib: [4309]: info: cib_process_request: Operation
complete: op cib_modify for section crm_config (origin=local/crmd/636,
version=0.31.1): ok (rc=0)
Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: do_pe_invoke_callback:
Invoking the PE: query=638, ref=pe_calc-dc-1299490330-545, seq=11840, quorate=0
Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: set_fs_notified:
17633D496670435F99A9C3A12F3FFFF0 set_fs_notified nodeid 2
Mar 7 10:32:10 cluster-node1 ocfs2_controld[5544]: message from dlmcontrol
Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: unpack_config: Startup
probes: enabled
Mar 7 10:32:10 cluster-node1 ocfs2_controld[5544]: Notified for
"17633D496670435F99A9C3A12F3FFFF0", node 2, status 0
Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Mar 7 10:32:10 cluster-node1 ocfs2_controld[5544]: Completing notification on
"17633D496670435F99A9C3A12F3FFFF0" for node 2
Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: unpack_config: Node
scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: unpack_domains: Unpacking
domains
Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: determine_online_status:
Node cluster-node1 is online
Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: clone_print:
Master/Slave Set: ms_drbd
Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: short_print:
Masters: [ cluster-node1 ]
Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: short_print:
Stopped: [ p_drbd:1 ]
Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: clone_print: Clone Set:
cl_dlm
Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: short_print:
Started: [ cluster-node1 ]
Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: short_print:
Stopped: [ p_dlm:1 ]
Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: clone_print: Clone Set:
cl_o2cb
Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: short_print:
Started: [ cluster-node1 ]
Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: short_print:
Stopped: [ p_o2cb:1 ]
Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: clone_print: Clone Set:
cl_fs
Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: short_print:
Started: [ cluster-node1 ]
Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: short_print:
Stopped: [ p_fs:1 ]
Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: native_color: Resource
p_drbd:1 cannot run anywhere
Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: master_color: Promoting
p_drbd:0 (Master cluster-node1)
Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: master_color: ms_drbd:
Promoted 1 instances of a possible 2 to master
Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: master_color: Promoting
p_drbd:0 (Master cluster-node1)
Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: master_color: ms_drbd:
Promoted 1 instances of a possible 2 to master
Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: master_color: Promoting
p_drbd:0 (Master cluster-node1)
Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: master_color: ms_drbd:
Promoted 1 instances of a possible 2 to master
Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: native_color: Resource
p_dlm:1 cannot run anywhere
Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: native_color: Resource
p_o2cb:1 cannot run anywhere
Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: native_color: Resource
p_fs:1 cannot run anywhere
Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: find_compatible_child:
Colocating p_dlm:0 with p_drbd:0 on cluster-node1
Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh:
Interleaving p_drbd:0 with p_dlm:0
Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: find_compatible_child:
Colocating p_drbd:0 with p_dlm:0 on cluster-node1
Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh:
Interleaving p_dlm:0 with p_drbd:0
Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: find_compatible_child:
Colocating p_o2cb:0 with p_dlm:0 on cluster-node1
Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh:
Interleaving p_dlm:0 with p_o2cb:0
Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: find_compatible_child:
Colocating p_dlm:0 with p_o2cb:0 on cluster-node1
Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh:
Interleaving p_o2cb:0 with p_dlm:0
Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: find_compatible_child:
Colocating p_fs:0 with p_o2cb:0 on cluster-node1
Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh:
Interleaving p_o2cb:0 with p_fs:0
Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: find_compatible_child:
Colocating p_o2cb:0 with p_fs:0 on cluster-node1
Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh:
Interleaving p_fs:0 with p_o2cb:0
Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave
resource p_drbd:0 (Master cluster-node1)
Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave
resource p_drbd:1 (Stopped)
Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave
resource p_dlm:0 (Started cluster-node1)
Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave
resource p_dlm:1 (Stopped)
Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave
resource p_o2cb:0 (Started cluster-node1)
Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave
resource p_o2cb:1 (Stopped)
Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave
resource p_fs:0 (Started cluster-node1)
Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave
resource p_fs:1 (Stopped)
Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: unpack_graph: Unpacked
transition 62: 0 actions in 0 synapses
Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: do_te_invoke: Processing
graph 62 (ref=pe_calc-dc-1299490330-545) derived from
/var/lib/pengine/pe-input-4730.bz2
Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: run_graph:
====================================================
Mar 7 10:32:10 cluster-node1 crmd: [4313]: notice: run_graph: Transition 62
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-input-4730.bz2): Complete
Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: te_graph_trigger: Transition
62 is now complete
Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: notify_crmd: Transition 62
status: done - <null>
Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: do_state_transition: Starting
PEngine Recheck Timer
Mar 7 10:32:10 cluster-node1 cib: [10261]: info: write_cib_contents: Archived
previous version as /var/lib/heartbeat/crm/cib-23.raw
Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: process_pe_message:
Transition 62: PEngine Input stored in: /var/lib/pengine/pe-input-4730.bz2
Mar 7 10:32:10 cluster-node1 cib: [10261]: info: write_cib_contents: Wrote
version 0.31.0 of the CIB to disk (digest: 1360ea4c1e6d061a115b8efa6794189a)
Mar 7 10:32:10 cluster-node1 cib: [10261]: info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.ljbIv1 (digest:
/var/lib/heartbeat/crm/cib.HFfG5H)
Mar 7 10:35:24 cluster-node1 cib: [4309]: info: cib_stats: Processed 660
operations (2333.00us average, 0% utilization) in the last 10min
Mar 7 10:45:24 cluster-node1 cib: [4309]: info: cib_stats: Processed 629
operations (651.00us average, 0% utilization) in the last 10min
Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: crm_timer_popped: PEngine
Recheck Timer (I_PE_CALC) just popped!
Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED
origin=crm_timer_popped ]
Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: do_state_transition:
Progressed to state S_POLICY_ENGINE after C_TIMER_POPPED
Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: do_state_transition: All 1
cluster nodes are eligible to run resources.
Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: do_pe_invoke: Query 639:
Requesting the current CIB: S_POLICY_ENGINE
Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: do_pe_invoke_callback:
Invoking the PE: query=639, ref=pe_calc-dc-1299491230-546, seq=11840, quorate=0
Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: unpack_config: Startup
probes: enabled
Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: unpack_config: Node
scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: unpack_domains: Unpacking
domains
Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: determine_online_status:
Node cluster-node1 is online
Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: clone_print:
Master/Slave Set: ms_drbd
Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: short_print:
Masters: [ cluster-node1 ]
Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: short_print:
Stopped: [ p_drbd:1 ]
Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: clone_print: Clone Set:
cl_dlm
Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: short_print:
Started: [ cluster-node1 ]
Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: short_print:
Stopped: [ p_dlm:1 ]
Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: clone_print: Clone Set:
cl_o2cb
Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: short_print:
Started: [ cluster-node1 ]
Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: short_print:
Stopped: [ p_o2cb:1 ]
Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: clone_print: Clone Set:
cl_fs
Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: short_print:
Started: [ cluster-node1 ]
Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: short_print:
Stopped: [ p_fs:1 ]
Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: native_color: Resource
p_drbd:1 cannot run anywhere
Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: master_color: Promoting
p_drbd:0 (Master cluster-node1)
Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: master_color: ms_drbd:
Promoted 1 instances of a possible 2 to master
Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: master_color: Promoting
p_drbd:0 (Master cluster-node1)
Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: master_color: ms_drbd:
Promoted 1 instances of a possible 2 to master
Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: master_color: Promoting
p_drbd:0 (Master cluster-node1)
Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: master_color: ms_drbd:
Promoted 1 instances of a possible 2 to master
Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: native_color: Resource
p_dlm:1 cannot run anywhere
Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: native_color: Resource
p_o2cb:1 cannot run anywhere
Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: native_color: Resource
p_fs:1 cannot run anywhere
Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: find_compatible_child:
Colocating p_dlm:0 with p_drbd:0 on cluster-node1
Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh:
Interleaving p_drbd:0 with p_dlm:0
Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: find_compatible_child:
Colocating p_drbd:0 with p_dlm:0 on cluster-node1
Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh:
Interleaving p_dlm:0 with p_drbd:0
Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: find_compatible_child:
Colocating p_o2cb:0 with p_dlm:0 on cluster-node1
Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh:
Interleaving p_dlm:0 with p_o2cb:0
Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: find_compatible_child:
Colocating p_dlm:0 with p_o2cb:0 on cluster-node1
Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh:
Interleaving p_o2cb:0 with p_dlm:0
Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: find_compatible_child:
Colocating p_fs:0 with p_o2cb:0 on cluster-node1
Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh:
Interleaving p_o2cb:0 with p_fs:0
Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: find_compatible_child:
Colocating p_o2cb:0 with p_fs:0 on cluster-node1
Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh:
Interleaving p_fs:0 with p_o2cb:0
Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave
resource p_drbd:0 (Master cluster-node1)
Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave
resource p_drbd:1 (Stopped)
Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave
resource p_dlm:0 (Started cluster-node1)
Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave
resource p_dlm:1 (Stopped)
Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave
resource p_o2cb:0 (Started cluster-node1)
Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave
resource p_o2cb:1 (Stopped)
Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave
resource p_fs:0 (Started cluster-node1)
Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave
resource p_fs:1 (Stopped)
Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: unpack_graph: Unpacked
transition 63: 0 actions in 0 synapses
Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: do_te_invoke: Processing
graph 63 (ref=pe_calc-dc-1299491230-546) derived from
/var/lib/pengine/pe-input-4731.bz2
Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: run_graph:
====================================================
Mar 7 10:47:10 cluster-node1 crmd: [4313]: notice: run_graph: Transition 63
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-input-4731.bz2): Complete
Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: te_graph_trigger: Transition
63 is now complete
Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: notify_crmd: Transition 63
status: done - <null>
Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: do_state_transition: Starting
PEngine Recheck Timer
Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: process_pe_message:
Transition 63: PEngine Input stored in: /var/lib/pengine/pe-input-4731.bz2
Any help is appreciated.
Thank you and kind regards,
Sascha Hagedorn
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems