Dominik,
As usual, you are right on the money. I should have caught that myself. Thank
you for catching that for me. What happened was that I used a different server
to compile DRBD and I had assumed that Nomen and Rubic (my test nodes) were on
the same kernel.
Moreover, I had also combined Neil's suggestion to yours as he had mentioned
that pacemaker-1.0.1 and drbd-8.2 works.
My current issues are as follows:
1) I cannot migrate the resource fs0 from Nomen to Rubric. Running the
command " crm resource migrate fs0" just puts fs0 to offline state. This
sounds like a config change. NOTE: I am planning to add fs0 into a Group that
will be able to migrate between the two nodes (Nomen and Rubric). Help.
Please provide the crm(live) syntax as I have tried the ones below and crm
complains that the syntax is wrong.
order ms-drbd0-before-fs0 mandatory: ms-drbd0:promote fs0:start
colocation fs0-on-ms-drbd0 inf: fs0 ms-drbd0:Master
2) Is there a documentation for what resources, constraints and the like I can
add into the cib.xml via crm(live)? Moreover, their syntax to add them via
crm(live)?
Help.
Thank you in advance.
FYI, below is my current configuration as well as logs during the migration
test.
#######################
#Current Configuration#
#######################
Installed Applications:
=======================
drbd-8.2.7-3
drbd-km-2.6.18_128.1.1.el5-8.2.7-3
heartbeat-2.99.2-6.1
pacemaker-1.0.1-3.1
kernel-2.6.18-128.1.1.el5
drbd.conf:
==========
global {
usage-count no;
}
resource r0 {
protocol C;
handlers {
pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
pri-lost "echo pri-lost. Have a look at the log files. | mail -s 'DRBD
Alert' root";
out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
}
startup {
wfc-timeout 0;
}
disk {
on-io-error pass_on;
}
net {
max-buffers 2048;
cram-hmac-alg "sha1";
shared-secret "FooFunFactory";
after-sb-0pri disconnect;
after-sb-1pri disconnect;
after-sb-2pri disconnect;
rr-conflict disconnect;
}
syncer {
rate 100M;
al-extents 257;
}
on nomen.esri.com {
device /dev/drbd0;
disk /dev/sda5;
address 192.168.0.1:7789;
meta-disk internal;
}
on rubric.esri.com {
device /dev/drbd0;
disk /dev/sda5;
address 192.168.0.2:7789;
meta-disk internal;
}
}
ha.cf:
======
# Logging
debug 3
use_logd false
logfacility daemon
# Misc Options
traditional_compression off
compression bz2
coredumps true
# Communications
udpport 691
bcast eth1
autojoin any
# Thresholds (in seconds)
keepalive 1
warntime 6
deadtime 10
initdead 15
ping 10.50.254.254
crm respawn
apiauth mgmtd uid=root
respawn root /usr/lib/heartbeat/mgmtd -v
cib.xml:
========
<cib admin_epoch="0" validate-with="pacemaker-1.0" crm_feature_set="3.0"
have-quorum="1" epoch="153" num_updates="0" cib-last-written="Fri Mar 6
12:52:27 2009" dc-uuid="3a8b681c-a14b-4037-a8e6-2d4af2eff88e">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
value="1.0.1-node: 6fc5ce8302abf145a02891ec41e5a492efbe8efe"/>
<nvpair id="cib-bootstrap-options-last-lrm-refresh"
name="last-lrm-refresh" value="1236213117"/>
</cluster_property_set>
</crm_config>
<nodes>
<node id="3a8b681c-a14b-4037-a8e6-2d4af2eff88e" uname="nomen.esri.com"
type="normal"/>
<node id="a5e95310-f27d-418e-9cb9-42e50310f702" uname="rubric.esri.com"
type="normal"/>
</nodes>
<resources>
<master id="ms-drbd0">
<meta_attributes id="ms-drbd0-meta_attributes">
<nvpair id="ms-drbd0-meta_attributes-clone-max" name="clone-max"
value="2"/>
<nvpair id="ms-drbd0-meta_attributes-notify" name="notify"
value="true"/>
<nvpair id="ms-drbd0-meta_attributes-globally-unique"
name="globally-unique" value="false"/>
<nvpair id="ms-drbd0-meta_attributes-target-role" name="target-role"
value="Started"/>
</meta_attributes>
<primitive class="ocf" id="drbd0" provider="heartbeat" type="drbd">
<instance_attributes id="drbd0-instance_attributes">
<nvpair id="drbd0-instance_attributes-drbd_resource"
name="drbd_resource" value="r0"/>
</instance_attributes>
<operations id="drbd0-ops">
<op id="drbd0-monitor-59s" interval="59s" name="monitor"
role="Master" timeout="30s"/>
<op id="drbd0-monitor-60s" interval="60s" name="monitor"
role="Slave" timeout="30s"/>
</operations>
</primitive>
</master>
<primitive class="ocf" id="VIP" provider="heartbeat" type="IPaddr">
<instance_attributes id="VIP-instance_attributes">
<nvpair id="VIP-instance_attributes-ip" name="ip"
value="10.50.26.250"/>
</instance_attributes>
<operations id="VIP-ops">
<op id="VIP-monitor-5s" interval="5s" name="monitor" timeout="5s"/>
</operations>
</primitive>
<primitive class="ocf" id="fs0" provider="heartbeat" type="Filesystem">
<instance_attributes id="fs0-instance_attributes">
<nvpair id="fs0-instance_attributes-fstype" name="fstype"
value="ext3"/>
<nvpair id="fs0-instance_attributes-directory" name="directory"
value="/data"/>
<nvpair id="fs0-instance_attributes-device" name="device"
value="/dev/drbd0"/>
</instance_attributes>
</primitive>
</resources>
<constraints/>
</configuration>
</cib>
messages:
==================
Mar 6 12:56:07 nomen lrmd: [14509]: info: Resource Agent output: []
Mar 6 12:56:08 nomen crm_shadow: [1551]: info: Invoked: crm_shadow
Mar 6 12:56:08 nomen crm_shadow: [1565]: info: Invoked: crm_shadow
Mar 6 12:56:08 nomen crm_resource: [1566]: info: Invoked: crm_resource -M -r
fs0
Mar 6 12:56:09 nomen cib: [14508]: info: cib_process_request: Operation
complete: op cib_delete for section constraints (origin=local/crm_resource/3):
ok (rc=0)
Mar 6 12:56:09 nomen haclient: on_event:evt:cib_changed
Mar 6 12:56:09 nomen crmd: [14603]: info: abort_transition_graph:
need_abort:60 - Triggered transition abort (complete=1) : Non-status change
Mar 6 12:56:09 nomen crmd: [14603]: info: need_abort: Aborting on change to
epoch
Mar 6 12:56:09 nomen crmd: [14603]: info: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
origin=abort_transition_graph ]
Mar 6 12:56:09 nomen crmd: [14603]: info: do_state_transition: All 2 cluster
nodes are eligible to run resources.
Mar 6 12:56:09 nomen crmd: [14603]: info: do_pe_invoke: Query 112: Requesting
the current CIB: S_POLICY_ENGINE
Mar 6 12:56:09 nomen cib: [14508]: info: log_data_element: cib:diff: - <cib
epoch="153" num_updates="2" />
Mar 6 12:56:09 nomen cib: [14508]: info: log_data_element: cib:diff: + <cib
epoch="154" num_updates="1" >
Mar 6 12:56:09 nomen cib: [14508]: info: log_data_element: cib:diff: +
<configuration >
Mar 6 12:56:09 nomen cib: [14508]: info: log_data_element: cib:diff: +
<constraints >
Mar 6 12:56:09 nomen cib: [14508]: info: log_data_element: cib:diff: +
<rsc_location id="cli-standby-fs0" rsc="fs0" __crm_diff_marker__="added:top" >
Mar 6 12:56:09 nomen cib: [14508]: info: log_data_element: cib:diff: +
<rule id="cli-standby-rule-fs0" score="-INFINITY" boolean-op="and" >
Mar 6 12:56:09 nomen cib: [14508]: info: log_data_element: cib:diff: +
<expression id="cli-standby-expr-fs0" attribute="#uname" operation="eq"
value="nomen.esri.com" type="string" />
Mar 6 12:56:09 nomen cib: [14508]: info: log_data_element: cib:diff: +
</rule>
Mar 6 12:56:09 nomen cib: [14508]: info: log_data_element: cib:diff: +
</rsc_location>
Mar 6 12:56:09 nomen cib: [14508]: info: log_data_element: cib:diff: +
</constraints>
Mar 6 12:56:09 nomen cib: [14508]: info: log_data_element: cib:diff: +
</configuration>
Mar 6 12:56:09 nomen cib: [14508]: info: log_data_element: cib:diff: + </cib>
Mar 6 12:56:09 nomen cib: [14508]: info: cib_process_request: Operation
complete: op cib_modify for section constraints (origin=local/crm_resource/4):
ok (rc=0)
Mar 6 12:56:09 nomen crmd: [14603]: info: do_pe_invoke_callback: Invoking the
PE: ref=pe_calc-dc-1236372969-107, seq=2, quorate=1
Mar 6 12:56:09 nomen pengine: [14645]: WARN: unpack_resources: No STONITH
resources have been defined
Mar 6 12:56:09 nomen pengine: [14645]: info: determine_online_status: Node
rubric.esri.com is online
Mar 6 12:56:09 nomen pengine: [14645]: info: unpack_rsc_op: fs0_start_0 on
rubric.esri.com returned 1 (unknown error) instead of the expected value: 0 (ok)
Mar 6 12:56:09 nomen pengine: [14645]: WARN: unpack_rsc_op: Processing failed
op fs0_start_0 on rubric.esri.com: Error
Mar 6 12:56:09 nomen pengine: [14645]: WARN: unpack_rsc_op: Compatibility
handling for failed op fs0_start_0 on rubric.esri.com
Mar 6 12:56:09 nomen pengine: [14645]: info: determine_online_status: Node
nomen.esri.com is online
Mar 6 12:56:09 nomen pengine: [14645]: notice: clone_print: Master/Slave Set:
ms-drbd0
Mar 6 12:56:09 nomen pengine: [14645]: notice: native_print: drbd0:0
(ocf::heartbeat:drbd): Master nomen.esri.com
Mar 6 12:56:09 nomen pengine: [14645]: notice: native_print: drbd0:1
(ocf::heartbeat:drbd): Started rubric.esri.com
Mar 6 12:56:09 nomen pengine: [14645]: notice: native_print: VIP
(ocf::heartbeat:IPaddr): Started nomen.esri.com
Mar 6 12:56:09 nomen pengine: [14645]: notice: native_print: fs0
(ocf::heartbeat:Filesystem):Started nomen.esri.com
Mar 6 12:56:09 nomen pengine: [14645]: info: get_failcount: fs0 has failed
1000000 times on rubric.esri.com
Mar 6 12:56:09 nomen pengine: [14645]: info: master_color: Promoting drbd0:0
(Master nomen.esri.com)
Mar 6 12:56:09 nomen pengine: [14645]: info: master_color: ms-drbd0: Promoted
1 instances of a possible 1 to master
Mar 6 12:56:09 nomen pengine: [14645]: WARN: native_color: Resource fs0 cannot
run anywhere
Mar 6 12:56:09 nomen pengine: [14645]: notice: NoRoleChange: Leave resource
drbd0:0 (Master nomen.esri.com)
Mar 6 12:56:09 nomen pengine: [14645]: notice: NoRoleChange: Leave resource
drbd0:1 (Slave rubric.esri.com)
Mar 6 12:56:09 nomen pengine: [14645]: notice: NoRoleChange: Leave resource
drbd0:0 (Master nomen.esri.com)
Mar 6 12:56:09 nomen pengine: [14645]: notice: NoRoleChange: Leave resource
drbd0:1 (Slave rubric.esri.com)
Mar 6 12:56:09 nomen pengine: [14645]: notice: NoRoleChange: Leave resource
VIP (Started nomen.esri.com)
Mar 6 12:56:09 nomen pengine: [14645]: notice: NoRoleChange: Stop resource fs0
(Started nomen.esri.com)
Mar 6 12:56:09 nomen pengine: [14645]: notice: StopRsc: nomen.esri.com
Stop fs0
Mar 6 12:56:09 nomen mgmtd: [14526]: info: CIB query: cib
Mar 6 12:56:09 nomen crmd: [14603]: info: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Mar 6 12:56:09 nomen pengine: [14645]: WARN: process_pe_message: Transition
34: WARNINGs found during PE processing. PEngine Input stored in:
/var/lib/heartbeat/pengine/pe-warn-37.bz2
Mar 6 12:56:09 nomen pengine: [14645]: info: process_pe_message: Configuration
WARNINGs found during PE processing. Please run "crm_verify -L" to identify
issues.
Mar 6 12:56:09 nomen crmd: [14603]: info: unpack_graph: Unpacked transition
34: 2 actions in 2 synapses
Mar 6 12:56:09 nomen crmd: [14603]: info: do_te_invoke: Processing graph 34
(ref=pe_calc-dc-1236372969-107) derived from
/var/lib/heartbeat/pengine/pe-warn-37.bz2
Mar 6 12:56:09 nomen crmd: [14603]: info: send_rsc_command: Initiating action
41: stop fs0_stop_0 on nomen.esri.com
Mar 6 12:56:09 nomen cib: [1567]: info: write_cib_contents: Wrote version
0.154.0 of the CIB to disk (digest: bbea2bdada182cedaa9f52f91c178cdb)
Mar 6 12:56:09 nomen cib: [1567]: info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.xml (digest:
/var/lib/heartbeat/crm/cib.xml.sig)
Mar 6 12:56:09 nomen cib: [14508]: info: Managed write_cib_contents process
1567 exited with return code 0.
Mar 6 12:56:09 nomen crmd: [14603]: info: do_lrm_rsc_op: Performing
key=41:34:0:44aada21-7997-4a4f-ba9a-4ae8a2629a58 op=fs0_stop_0 )
Mar 6 12:56:09 nomen lrmd: [14509]: info: rsc:fs0: stop
Mar 6 12:56:09 nomen Filesystem[1569]: INFO: Running stop for /dev/drbd0 on
/data
Mar 6 12:56:09 nomen Filesystem[1569]: INFO: Trying to unmount /data
Mar 6 12:56:09 nomen Filesystem[1569]: INFO: unmounted /data successfully
Mar 6 12:56:09 nomen lrmd: [14509]: info: Managed fs0:stop process 1569 exited
with return code 0.
Mar 6 12:56:09 nomen lrmd: [14509]: info: Resource Agent output: []
Mar 6 12:56:09 nomen crmd: [14603]: info: process_lrm_event: LRM operation
fs0_stop_0 (call=46, rc=0, cib-update=113, confirmed=true) complete ok
Mar 6 12:56:09 nomen haclient: on_event:evt:cib_changed
Mar 6 12:56:09 nomen crmd: [14603]: info: match_graph_event: Action fs0_stop_0
(41) confirmed on nomen.esri.com (rc=0)
Mar 6 12:56:09 nomen crmd: [14603]: info: te_pseudo_action: Pseudo action 4
fired and confirmed
Mar 6 12:56:09 nomen crmd: [14603]: info: run_graph:
====================================================
Mar 6 12:56:09 nomen crmd: [14603]: notice: run_graph: Transition 34
(Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/heartbeat/pengine/pe-warn-37.bz2): Complete
Mar 6 12:56:09 nomen crmd: [14603]: info: te_graph_trigger: Transition 34 is
now complete
Mar 6 12:56:09 nomen cib: [14508]: info: cib_process_request: Operation
complete: op cib_modify for section 'all' (origin=local/crmd/113): ok (rc=0)
Mar 6 12:56:09 nomen crmd: [14603]: info: notify_crmd: Transition 34 status:
done - <null>
Mar 6 12:56:09 nomen crmd: [14603]: info: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Mar 6 12:56:10 nomen mgmtd: [14526]: info: CIB query: cib
Mar 6 12:56:12 nomen lrmd: [14509]: info: Resource Agent output: []
Mar 6 12:56:14 nomen crm_shadow: [1662]: info: Invoked: crm_shadow
Mar 6 12:56:14 nomen crm_shadow: [1676]: info: Invoked: crm_shadow
Mar 6 12:56:14 nomen crm_resource: [1677]: info: Invoked: crm_resource -U -r
fs0
Mar 6 12:56:14 nomen cib: [14508]: info: log_data_element: cib:diff: - <cib
epoch="154" num_updates="2" >
Mar 6 12:56:14 nomen cib: [14508]: info: log_data_element: cib:diff: -
<configuration >
Mar 6 12:56:14 nomen cib: [14508]: info: log_data_element: cib:diff: -
<constraints >
Mar 6 12:56:14 nomen cib: [14508]: info: log_data_element: cib:diff: -
<rsc_location id="cli-standby-fs0" rsc="fs0" __crm_diff_marker__="removed:top" >
Mar 6 12:56:14 nomen cib: [14508]: info: log_data_element: cib:diff: -
<rule id="cli-standby-rule-fs0" score="-INFINITY" boolean-op="and" >
Mar 6 12:56:14 nomen haclient: on_event:evt:cib_changed
Mar 6 12:56:14 nomen cib: [14508]: info: log_data_element: cib:diff: -
<expression id="cli-standby-expr-fs0" attribute="#uname" operation="eq"
value="nomen.esri.com" type="string" />
Mar 6 12:56:14 nomen cib: [14508]: info: log_data_element: cib:diff: -
</rule>
Mar 6 12:56:14 nomen cib: [14508]: info: log_data_element: cib:diff: -
</rsc_location>
Mar 6 12:56:14 nomen cib: [14508]: info: log_data_element: cib:diff: -
</constraints>
Mar 6 12:56:14 nomen cib: [14508]: info: log_data_element: cib:diff: -
</configuration>
Mar 6 12:56:14 nomen cib: [14508]: info: log_data_element: cib:diff: - </cib>
Mar 6 12:56:14 nomen cib: [14508]: info: log_data_element: cib:diff: + <cib
epoch="155" num_updates="1" />
Mar 6 12:56:14 nomen cib: [14508]: info: cib_process_request: Operation
complete: op cib_delete for section constraints (origin=local/crm_resource/3):
ok (rc=0)
Mar 6 12:56:14 nomen crmd: [14603]: info: abort_transition_graph:
need_abort:60 - Triggered transition abort (complete=1) : Non-status change
Mar 6 12:56:14 nomen crmd: [14603]: info: need_abort: Aborting on change to
epoch
Mar 6 12:56:14 nomen crmd: [14603]: info: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
origin=abort_transition_graph ]
Mar 6 12:56:14 nomen crmd: [14603]: info: do_state_transition: All 2 cluster
nodes are eligible to run resources.
Mar 6 12:56:14 nomen crmd: [14603]: info: do_pe_invoke: Query 114: Requesting
the current CIB: S_POLICY_ENGINE
Mar 6 12:56:14 nomen cib: [14508]: info: cib_process_request: Operation
complete: op cib_delete for section constraints (origin=local/crm_resource/4):
ok (rc=0)
Mar 6 12:56:14 nomen crmd: [14603]: info: do_pe_invoke_callback: Invoking the
PE: ref=pe_calc-dc-1236372974-109, seq=2, quorate=1
Mar 6 12:56:14 nomen pengine: [14645]: WARN: unpack_resources: No STONITH
resources have been defined
Mar 6 12:56:14 nomen pengine: [14645]: info: determine_online_status: Node
rubric.esri.com is online
Mar 6 12:56:14 nomen pengine: [14645]: info: unpack_rsc_op: fs0_start_0 on
rubric.esri.com returned 1 (unknown error) instead of the expected value: 0 (ok)
Mar 6 12:56:14 nomen pengine: [14645]: WARN: unpack_rsc_op: Processing failed
op fs0_start_0 on rubric.esri.com: Error
Mar 6 12:56:14 nomen pengine: [14645]: WARN: unpack_rsc_op: Compatibility
handling for failed op fs0_start_0 on rubric.esri.com
Mar 6 12:56:14 nomen pengine: [14645]: info: determine_online_status: Node
nomen.esri.com is online
Mar 6 12:56:14 nomen pengine: [14645]: notice: clone_print: Master/Slave Set:
ms-drbd0
Mar 6 12:56:14 nomen pengine: [14645]: notice: native_print: drbd0:0
(ocf::heartbeat:drbd): Master nomen.esri.com
Mar 6 12:56:14 nomen pengine: [14645]: notice: native_print: drbd0:1
(ocf::heartbeat:drbd): Started rubric.esri.com
Mar 6 12:56:14 nomen pengine: [14645]: notice: native_print: VIP
(ocf::heartbeat:IPaddr): Started nomen.esri.com
Mar 6 12:56:14 nomen pengine: [14645]: notice: native_print: fs0
(ocf::heartbeat:Filesystem):Stopped
Mar 6 12:56:14 nomen pengine: [14645]: info: get_failcount: fs0 has failed
1000000 times on rubric.esri.com
Mar 6 12:56:14 nomen pengine: [14645]: info: master_color: Promoting drbd0:0
(Master nomen.esri.com)
Mar 6 12:56:14 nomen pengine: [14645]: info: master_color: ms-drbd0: Promoted
1 instances of a possible 1 to master
Mar 6 12:56:14 nomen pengine: [14645]: notice: NoRoleChange: Leave resource
drbd0:0 (Master nomen.esri.com)
Mar 6 12:56:14 nomen pengine: [14645]: notice: NoRoleChange: Leave resource
drbd0:1 (Slave rubric.esri.com)
Mar 6 12:56:14 nomen pengine: [14645]: notice: NoRoleChange: Leave resource
drbd0:0 (Master nomen.esri.com)
Mar 6 12:56:14 nomen pengine: [14645]: notice: NoRoleChange: Leave resource
drbd0:1 (Slave rubric.esri.com)
Mar 6 12:56:14 nomen pengine: [14645]: notice: NoRoleChange: Leave resource
VIP (Started nomen.esri.com)
Mar 6 12:56:14 nomen pengine: [14645]: notice: StartRsc: nomen.esri.com
Start fs0
Mar 6 12:56:14 nomen mgmtd: [14526]: info: CIB query: cib
Mar 6 12:56:14 nomen crmd: [14603]: info: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Mar 6 12:56:14 nomen pengine: [14645]: info: process_pe_message: Transition
35: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-75.bz2
Mar 6 12:56:14 nomen pengine: [14645]: info: process_pe_message: Configuration
WARNINGs found during PE processing. Please run "crm_verify -L" to identify
issues.
Mar 6 12:56:14 nomen crmd: [14603]: info: unpack_graph: Unpacked transition
35: 1 actions in 1 synapses
Mar 6 12:56:14 nomen crmd: [14603]: info: do_te_invoke: Processing graph 35
(ref=pe_calc-dc-1236372974-109) derived from
/var/lib/heartbeat/pengine/pe-input-75.bz2
Mar 6 12:56:14 nomen crmd: [14603]: info: send_rsc_command: Initiating action
41: start fs0_start_0 on nomen.esri.com
Mar 6 12:56:14 nomen crmd: [14603]: info: do_lrm_rsc_op: Performing
key=41:35:0:44aada21-7997-4a4f-ba9a-4ae8a2629a58 op=fs0_start_0 )
Mar 6 12:56:14 nomen lrmd: [14509]: info: rsc:fs0: start
Mar 6 12:56:14 nomen Filesystem[1681]: INFO: Running start for /dev/drbd0 on
/data
Mar 6 12:56:14 nomen cib: [1678]: info: write_cib_contents: Wrote version
0.155.0 of the CIB to disk (digest: 0fd876c0a5f2db21a9aa66b3f997194f)
Mar 6 12:56:14 nomen cib: [1678]: info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.xml (digest:
/var/lib/heartbeat/crm/cib.xml.sig)
Mar 6 12:56:14 nomen cib: [14508]: info: Managed write_cib_contents process
1678 exited with return code 0.
Mar 6 12:56:14 nomen kernel: kjournald starting. Commit interval 5 seconds
Mar 6 12:56:14 nomen kernel: EXT3 FS on drbd0, internal journal
Mar 6 12:56:14 nomen kernel: EXT3-fs: mounted filesystem with ordered data
mode.
Mar 6 12:56:14 nomen lrmd: [14509]: info: Managed fs0:start process 1681
exited with return code 0.
Mar 6 12:56:14 nomen lrmd: [14509]: info: Resource Agent output: []
Mar 6 12:56:14 nomen crmd: [14603]: info: process_lrm_event: LRM operation
fs0_start_0 (call=47, rc=0, cib-update=115, confirmed=true) complete ok
Mar 6 12:56:15 nomen cib: [14508]: info: cib_process_request: Operation
complete: op cib_modify for section 'all' (origin=local/crmd/115): ok (rc=0)
Mar 6 12:56:15 nomen crmd: [14603]: info: match_graph_event: Action
fs0_start_0 (41) confirmed on nomen.esri.com (rc=0)
Mar 6 12:56:15 nomen crmd: [14603]: info: run_graph:
====================================================
Mar 6 12:56:15 nomen crmd: [14603]: notice: run_graph: Transition 35
(Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/heartbeat/pengine/pe-input-75.bz2): Complete
Mar 6 12:56:15 nomen crmd: [14603]: info: te_graph_trigger: Transition 35 is
now complete
Mar 6 12:56:15 nomen crmd: [14603]: info: notify_crmd: Transition 35 status:
done - <null>
Mar 6 12:56:15 nomen crmd: [14603]: info: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Mar 6 12:56:15 nomen haclient: on_event: from message queue: evt:cib_changed
Mar 6 12:56:15 nomen mgmtd: [14526]: info: CIB query: cib
Mar 6 12:56:15 nomen heartbeat: [14466]: WARN: G_CH_dispatch_int: Dispatch
function for read child took too long to execute: 70 ms (> 50 ms) (GSource:
0x94add68)
Mar 6 12:56:17 nomen lrmd: [14509]: info: Resource Agent output: []
Regards,
jerome
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Dominik Klein
Sent: Wednesday, March 04, 2009 10:54 PM
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] Having issues with getting DRBD to work with Pacemaker
Hi
Jerome Yanga wrote:
> Hi! I am having issues with getting DRBD to work with Pacemaker. I can get
> Pacemaker and DRBD run individually but not DRBD managed by Pacemaker. I
> tried following the instruction in the site below but the resources will not
> go online.
>
> http://clusterlabs.org/wiki/DRBD_HowTo_1.0
>
> Below is my configuration.
>
> Installed applications:
> =======================
> kernel-2.6.18-128.el5
copy that
> drbd-8.3.0-3
> heartbeat-2.99.2-6.1
> pacemaker-1.0.1-3.1
>
>
>
> drbd.conf:
> ==========
> global {
> usage-count no;
> }
>
> resource r0 {
> protocol C;
> handlers {
> pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
> pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
> local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
> outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
> pri-lost "echo pri-lost. Have a look at the log files. | mail -s 'DRBD
> Alert' root";
> out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
> }
> startup {
> wfc-timeout 0;
> }
>
> disk {
> on-io-error pass_on;
> }
> net {
> max-buffers 2048;
> after-sb-0pri disconnect;
> after-sb-1pri disconnect;
> after-sb-2pri disconnect;
> rr-conflict disconnect;
> }
> syncer {
> rate 100M;
> al-extents 257;
> }
> on nomen.esri.com {
> device /dev/drbd0;
> disk /dev/sda5;
> address 192.168.0.1:7789;
> meta-disk internal;
> }
> on rubric.esri.com {
> device /dev/drbd0;
> disk /dev/sda5;
> address 192.168.0.2:7789;
> meta-disk internal;
> }
> }
>
>
>
> Cib.xml:
> ========
> <cib admin_epoch="0" validate-with="pacemaker-1.0" crm_feature_set="3.0"
> have-quorum="1" dc-uuid="a5
> e95310-f27d-418e-9cb9-42e50310f702" epoch="56" num_updates="0"
> cib-last-written="Wed Mar 4 14:27:59
> 2009">
> <configuration>
> <crm_config>
> <cluster_property_set id="cib-bootstrap-options">
> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
> value="1.0.1-node: 6fc5ce830
> 2abf145a02891ec41e5a492efbe8efe"/>
> </cluster_property_set>
> </crm_config>
> <nodes>
> <node id="3a8b681c-a14b-4037-a8e6-2d4af2eff88e" uname="nomen.esri.com"
> type="normal"/>
> <node id="a5e95310-f27d-418e-9cb9-42e50310f702" uname="rubric.esri.com"
> type="normal"/>
> </nodes>
> <resources>
> <master id="ms-drbd0">
> <meta_attributes id="ms-drbd0-meta_attributes">
> <nvpair id="ms-drbd0-meta_attributes-clone-max" name="clone-max"
> value="2"/>
> <nvpair id="ms-drbd0-meta_attributes-notify" name="notify"
> value="true"/>
> <nvpair id="ms-drbd0-meta_attributes-globally-unique"
> name="globally-unique" value="false"
> />
> <nvpair name="target-role"
> id="ms-drbd0-meta_attributes-target-role" value="Started"/>
> </meta_attributes>
> <primitive class="ocf" id="drbd0" provider="heartbeat" type="drbd">
> <instance_attributes id="drbd0-instance_attributes">
> <nvpair id="drbd0-instance_attributes-drbd_resource"
> name="drbd_resource" value="r0"/>
> </instance_attributes>
> <operations id="drbd0-ops">
> <op id="drbd0-monitor-59s" interval="59s" name="monitor"
> role="Master" timeout="30s"/>
> <op id="drbd0-monitor-60s" interval="60s" name="monitor"
> role="Slave" timeout="30s"/>
> </operations>
> </primitive>
> </master>
> </resources>
> <constraints/>
> </configuration>
> </cib>
>
>
> /var/log/messages:
> ==================
> Mar 4 14:27:58 nomen crm_resource: [30167]: info: Invoked: crm_resource
> --meta -r ms-drbd0 -p target-role -v Started
> Mar 4 14:27:58 nomen cib: [29899]: info: cib_process_xpath: Processing
> cib_query op for
> //cib/configuration/resources//*...@id="ms-drbd0"]//meta_attributes//nvpa...@name="target-role"]
> (/cib/configuration/resources/master/meta_attributes/nvpair[4])
> Mar 4 14:27:59 nomen crmd: [29903]: info: do_lrm_rsc_op: Performing
> key=5:5:0:d4b86e31-ca4a-4033-8437-6486622eb19f op=drbd0:0_start_0 )
> Mar 4 14:27:59 nomen haclient: on_event:evt:cib_changed
> Mar 4 14:27:59 nomen lrmd: [29900]: info: rsc:drbd0:0: start
> Mar 4 14:27:59 nomen cib: [30168]: info: write_cib_contents: Wrote version
> 0.56.0 of the CIB to disk (digest: 2365d9802f1b9c55e0ed87b8ebda5db3)
> Mar 4 14:27:59 nomen cib: [30168]: info: retrieveCib: Reading cluster
> configuration from: /var/lib/heartbeat/crm/cib.xml (digest:
> /var/lib/heartbeat/crm/cib.xml.sig)
> Mar 4 14:27:59 nomen cib: [29899]: info: Managed write_cib_contents process
> 30168 exited with return code 0.
> Mar 4 14:27:59 nomen modprobe: FATAL: Module drbd not found.
> Mar 4 14:27:59 nomen lrmd: [29900]: info: RA output: (drbd0:0:start:stdout)
> Mar 4 14:27:59 nomen mgmtd: [29904]: info: CIB query: cib
> Mar 4 14:27:59 nomen lrmd: [29900]: info: RA output: (drbd0:0:start:stdout)
> Could not stat("/proc/drbd"): No such file or directory do you need to load
> the module? try: modprobe drbd Command 'drbdsetup /dev/drbd0 disk /dev/sda5
> /dev/sda5 internal --set-defaults --create-device --on-io-error=pass_on'
> terminated with exit code 20 drbdadm attach r0: exited with code 20
> Mar 4 14:27:59 nomen drbd[30169]: ERROR: r0 start: not in Secondary mode
> after start.
> Mar 4 14:27:59 nomen lrmd: [29900]: WARN: Managed drbd0:0:start process
> 30169 exited with return code 1.
> Mar 4 14:27:59 nomen crmd: [29903]: info: process_lrm_event: LRM operation
> drbd0:0_start_0 (call=3, rc=1, cib-update=13, confirmed=true) complete
> unknown error
> Mar 4 14:27:59 nomen haclient: on_event: from message queue: evt:cib_changed
> Mar 4 14:27:59 nomen mgmtd: [29904]: info: CIB query: cib
> Mar 4 14:28:00 nomen crmd: [29903]: info: do_lrm_rsc_op: Performing
> key=41:6:0:d4b86e31-ca4a-4033-8437-6486622eb19f op=drbd0:0_notify_0 )
> Mar 4 14:28:00 nomen lrmd: [29900]: info: rsc:drbd0:0: notify
> Mar 4 14:28:00 nomen lrmd: [29900]: info: Managed drbd0:0:notify process
> 30310 exited with return code 0.
> Mar 4 14:28:00 nomen crmd: [29903]: info: process_lrm_event: LRM operation
> drbd0:0_notify_0 (call=4, rc=0, cib-update=14, confirmed=true) complete ok
> Mar 4 14:28:00 nomen haclient: on_event: from message queue: evt:cib_changed
> Mar 4 14:28:00 nomen haclient: on_event: from message queue: evt:cib_changed
> Mar 4 14:28:00 nomen mgmtd: [29904]: info: CIB query: cib
> Mar 4 14:28:01 nomen crmd: [29903]: info: do_lrm_rsc_op: Performing
> key=2:6:0:d4b86e31-ca4a-4033-8437-6486622eb19f op=drbd0:0_stop_0 )
> Mar 4 14:28:01 nomen lrmd: [29900]: info: rsc:drbd0:0: stop
> Mar 4 14:28:01 nomen lrmd: [29900]: info: Managed drbd0:0:stop process 30324
> exited with return code 0.
> Mar 4 14:28:01 nomen crmd: [29903]: info: process_lrm_event: LRM operation
> drbd0:0_stop_0 (call=5, rc=0, cib-update=15, confirmed=true) complete ok
> Mar 4 14:28:01 nomen haclient: on_event: from message queue: evt:cib_changed
> Mar 4 14:28:01 nomen haclient: on_event: from message queue: evt:cib_changed
> Mar 4 14:28:01 nomen mgmtd: [29904]: info: CIB query: cib
> Mar 4 14:28:02 nomen crmd: [29903]: info: do_lrm_rsc_op: Performing
> key=10:6:0:d4b86e31-ca4a-4033-8437-6486622eb19f op=drbd0:1_start_0 )
> Mar 4 14:28:02 nomen lrmd: [29900]: info: rsc:drbd0:1: start
> Mar 4 14:28:02 nomen modprobe: FATAL: Module drbd not found.
> Mar 4 14:28:02 nomen lrmd: [29900]: info: RA output: (drbd0:1:start:stdout)
> Mar 4 14:28:02 nomen lrmd: [29900]: info: RA output: (drbd0:1:start:stdout)
> Could not stat("/proc/drbd"): No such file or directory do you need to load
> the module? try: modprobe drbd Command 'drbdsetup /dev/drbd0 disk /dev/sda5
> /dev/sda5 internal --set-defaults --create-device --on-io-error=pass_on'
> terminated with exit code 20 drbdadm attach r0: exited with code 20
> Mar 4 14:28:02 nomen drbd[30338]: ERROR: r0 start: not in Secondary mode
> after start.
> Mar 4 14:28:02 nomen lrmd: [29900]: WARN: Managed drbd0:1:start process
> 30338 exited with return code 1.
> Mar 4 14:28:02 nomen crmd: [29903]: info: process_lrm_event: LRM operation
> drbd0:1_start_0 (call=6, rc=1, cib-update=16, confirmed=true) complete
> unknown error
> Mar 4 14:28:02 nomen haclient: on_event: from message queue: evt:cib_changed
> Mar 4 14:28:02 nomen haclient: on_event: from message queue: evt:cib_changed
> Mar 4 14:28:02 nomen mgmtd: [29904]: info: CIB query: cib
> Mar 4 14:28:03 nomen crmd: [29903]: info: do_lrm_rsc_op: Performing
> key=44:7:0:d4b86e31-ca4a-4033-8437-6486622eb19f op=drbd0:1_notify_0 )
> Mar 4 14:28:03 nomen lrmd: [29900]: info: rsc:drbd0:1: notify
> Mar 4 14:28:03 nomen lrmd: [29900]: info: Managed drbd0:1:notify process
> 30472 exited with return code 0.
> Mar 4 14:28:03 nomen crmd: [29903]: info: process_lrm_event: LRM operation
> drbd0:1_notify_0 (call=7, rc=0, cib-update=17, confirmed=true) complete ok
> Mar 4 14:28:03 nomen haclient: on_event: from message queue: evt:cib_changed
> Mar 4 14:28:03 nomen haclient: on_event: from message queue: evt:cib_changed
> Mar 4 14:28:03 nomen mgmtd: [29904]: info: CIB query: cib
> Mar 4 14:28:04 nomen crmd: [29903]: info: do_lrm_rsc_op: Performing
> key=2:7:0:d4b86e31-ca4a-4033-8437-6486622eb19f op=drbd0:1_stop_0 )
> Mar 4 14:28:04 nomen lrmd: [29900]: info: rsc:drbd0:1: stop
> Mar 4 14:28:04 nomen lrmd: [29900]: info: Managed drbd0:1:stop process 30486
> exited with return code 0.
> Mar 4 14:28:04 nomen crmd: [29903]: info: process_lrm_event: LRM operation
> drbd0:1_stop_0 (call=8, rc=0, cib-update=18, confirmed=true) complete ok
> Mar 4 14:28:04 nomen haclient: on_event: from message queue: evt:cib_changed
> Mar 4 14:28:04 nomen haclient: on_event: from message queue: evt:cib_changed
> Mar 4 14:28:04 nomen mgmtd: [29904]: info: CIB query: cib
> Mar 4 14:28:05 nomen crmd: [29903]: info: do_lrm_rsc_op: Performing
> key=7:7:0:d4b86e31-ca4a-4033-8437-6486622eb19f op=drbd0:0_start_0 )
> Mar 4 14:28:05 nomen lrmd: [29900]: info: rsc:drbd0:0: start
> Mar 4 14:28:05 nomen modprobe: FATAL: Module drbd not found.
> Mar 4 14:28:05 nomen lrmd: [29900]: info: RA output: (drbd0:0:start:stdout)
> Mar 4 14:28:05 nomen lrmd: [29900]: info: RA output: (drbd0:0:start:stdout)
> Could not stat("/proc/drbd"): No such file or directory do you need to load
> the module? try: modprobe drbd Command 'drbdsetup /dev/drbd0 disk /dev/sda5
> /dev/sda5 internal --set-defaults --create-device --on-io-error=pass_on'
> terminated with exit code 20 drbdadm attach r0: exited with code 20
> Mar 4 14:28:05 nomen drbd[30500]: ERROR: r0 start: not in Secondary mode
> after start.
> Mar 4 14:28:05 nomen lrmd: [29900]: WARN: Managed drbd0:0:start process
> 30500 exited with return code 1.
> Mar 4 14:28:05 nomen crmd: [29903]: info: process_lrm_event: LRM operation
> drbd0:0_start_0 (call=9, rc=1, cib-update=19, confirmed=true) complete
> unknown error
> Mar 4 14:28:05 nomen haclient: on_event: from message queue: evt:cib_changed
> Mar 4 14:28:05 nomen mgmtd: [29904]: info: CIB query: cib
> Mar 4 14:28:06 nomen crmd: [29903]: info: do_lrm_rsc_op: Performing
> key=38:8:0:d4b86e31-ca4a-4033-8437-6486622eb19f op=drbd0:0_notify_0 )
> Mar 4 14:28:06 nomen lrmd: [29900]: info: rsc:drbd0:0: notify
> Mar 4 14:28:06 nomen lrmd: [29900]: info: Managed drbd0:0:notify process
> 30634 exited with return code 0.
> Mar 4 14:28:06 nomen crmd: [29903]: info: process_lrm_event: LRM operation
> drbd0:0_notify_0 (call=10, rc=0, cib-update=20, confirmed=true) complete ok
> Mar 4 14:28:06 nomen haclient: on_event: from message queue: evt:cib_changed
> Mar 4 14:28:06 nomen mgmtd: [29904]: info: CIB query: cib
> Mar 4 14:28:07 nomen crmd: [29903]: info: do_lrm_rsc_op: Performing
> key=1:8:0:d4b86e31-ca4a-4033-8437-6486622eb19f op=drbd0:0_stop_0 )
> Mar 4 14:28:07 nomen lrmd: [29900]: info: rsc:drbd0:0: stop
> Mar 4 14:28:07 nomen lrmd: [29900]: info: Managed drbd0:0:stop process 30648
> exited with return code 0.
> Mar 4 14:28:07 nomen crmd: [29903]: info: process_lrm_event: LRM operation
> drbd0:0_stop_0 (call=11, rc=0, cib-update=21, confirmed=true) complete ok
> Mar 4 14:28:07 nomen haclient: on_event: from message queue: evt:cib_changed
> Mar 4 14:28:07 nomen mgmtd: [29904]: info: CIB query: cib
> Mar 4 14:28:08 nomen haclient: on_event: from message queue: evt:cib_changed
> Mar 4 14:28:08 nomen mgmtd: [29904]: info: CIB query: cib
>
> FYI, I had to add the following line to /etc/init.d/drbd to get it working.
>
> insmod /lib/modules/2.6.18-92.1.22.el5/kernel/drivers/block/drbd.ko
copied from the start of your email.
kernel-2.6.18-128.el5
So your kernel module does not match your running kernel and therefore
the modprobe command cannot find the module.
Recompile drbd for your running kernel.
Regards
Dominik
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems