Hi all,
I'm endeavouring to setup a drbb pacemaker cluster on two Centos 6.4 kvm's.
The kvm are running on a Centos 6.4 host operating, vms installed on separate
logical volumes, with 1 GB ram allocated to each vm.
Drbb starts manually and promoting/demoting the device via drbdadm works fine.
Pacemaker seems to starts all resources without error on startup however the
drbd monitor fails , causing a pid to timeout and restarts all resources on
the primary node again..
Observing drbd via service drbd status or /proc/drbd when both nodes are online
is ok..however fails to promote drbd on active node when primary is put in
standby mode..
I've outlined cluster and drbd config setting below and attache the pacemaker
logs which may help explain the situation and behaviour clearly.
Not sure what is causing the issue, appreciate assistance with this issue, and
please let me know if you require more info..
Cheers,
Jimmy.
# cat /proc/drbd
version: 8.4.2 (api:1/proto:86-101)
GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by dag@Build64R6,
2012-09-06 08:16:10
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
ns:376 nr:0 dw:376 dr:6961 al:7 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
Primary node is put in standby mode
# cat /proc/drbd (webtext-2)
version: 8.4.2 (api:1/proto:86-101)
GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by dag@Build64R6,
2012-09-06 08:16:10
0: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/Outdated C r-----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
# cat /proc/drbd (webtext-1)
version: 8.4.2 (api:1/proto:86-101)
GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by dag@Build64R6,
2012-09-06 08:16:10
Jul 3 21:47:19 webtext-2 lrmd[2511]: warning: child_timeout_callback:
mysql_drbd_monitor_0 process (PID 18391) timed out
Jul 3 21:47:21 webtext-2 crmd[2514]: error: process_lrm_event: LRM
operation mysql_drbd_monitor_0 (666) Timed Out (timeout=20000ms)
Jul 3 22:51:46 webtext-2 lrmd[19204]: warning: child_timeout_callback:
mysql_drbd_promote_0 process (PID 21046) timed out
Jul 3 22:51:47 webtext-2 crmd[19207]: error: process_lrm_event: LRM
operation mysql_drbd_promote_0 (189) Timed Out (timeout=20000ms)
crmd extract from logs..
ul 03 21:18:32 [2514] webtext-2.vennetics.com crmd: notice:
te_rsc_command: Initiating action 51: notify mysql_drbd_pre_notify_stop_0
on webtext-1
Jul 03 21:18:40 [2514] webtext-2.vennetics.com crmd: notice:
te_rsc_command: Initiating action 2: stop mysql_drbd_stop_0 on webtext-2
(local)
Jul 03 21:18:46 [2514] webtext-2.vennetics.com crmd: info:
process_graph_event: Detected action (74.2) mysql_drbd_stop_0.650=not
installed: failed
Jul 03 21:18:46 [2514] webtext-2.vennetics.com crmd: notice:
te_rsc_command: Initiating action 52: notify mysql_drbd_post_notify_stop_0
on webtext-1
Jul 03 21:22:42 [2514] webtext-2.vennetics.com crmd: info:
process_graph_event: Detected action (61.12)
mysql_drbd_monitor_130000.354=not running: failed
Jul 03 21:22:42 [2514] webtext-2.vennetics.com crmd: notice:
te_rsc_command: Initiating action 55: notify mysql_drbd_pre_notify_demote_0
on webtext-1
Jul 03 21:22:50 [2514] webtext-2.vennetics.com crmd: notice:
te_rsc_command: Initiating action 55: notify mysql_drbd_pre_notify_demote_0
on webtext-1
Jul 03 21:22:55 [2514] webtext-2.vennetics.com crmd: notice:
te_rsc_command: Initiating action 9: demote mysql_drbd_demote_0 on webtext-1
Jul 03 21:23:00 [2514] webtext-2.vennetics.com crmd: info:
process_graph_event: Detected action (77.9)
mysql_drbd_demote_0.371=unknown error: failed
Jul 03 21:23:00 [2514] webtext-2.vennetics.com crmd: notice:
te_rsc_command: Initiating action 56: notify mysql_drbd_post_notify_demote_0
on webtext-1
Jul 03 21:23:05 [2514] webtext-2.vennetics.com crmd: notice:
te_rsc_command: Initiating action 48: notify mysql_drbd_pre_notify_stop_0 on
webtext-1
Jul 03 21:23:07 [2514] webtext-2.vennetics.com crmd: notice:
te_rsc_command: Initiating action 2: stop mysql_drbd_stop_0 on webtext-1
Jul 03 21:23:11 [2514] webtext-2.vennetics.com crmd: notice:
te_rsc_command: Initiating action 7: start mysql_drbd_start_0 on webtext-1
Jul 03 21:23:19 [2514] webtext-2.vennetics.com crmd: notice:
te_rsc_command: Initiating action 46: notify mysql_drbd_post_notify_start_0
on webtext-1
Jul 03 21:23:23 [2514] webtext-2.vennetics.com crmd: notice:
te_rsc_command: Initiating action 53: notify mysql_drbd_pre_notify_promote_0
on webtext-1
Jul 03 21:23:25 [2514] webtext-2.vennetics.com crmd: notice:
te_rsc_command: Initiating action 9: promote mysql_drbd_promote_0 on
webtext-1
Jul 03 21:23:30 [2514] webtext-2.vennetics.com crmd: notice:
te_rsc_command: Initiating action 54: notify
mysql_drbd_post_notify_promote_0 on webtext-1
Jul 03 21:23:36 [2514] webtext-2.vennetics.com crmd: notice:
te_rsc_command: Initiating action 10: monitor mysql_drbd_monitor_130000 on
webtext-1
Jul 03 21:27:40 [2514] webtext-2.vennetics.com crmd: notice:
te_rsc_command: Initiating action 52: notify mysql_drbd_pre_notify_demote_0
on webtext-1
Jul 03 21:27:43 [2514] webtext-2.vennetics.com crmd: notice:
te_rsc_command: Initiating action 8: demote mysql_drbd_demote_0 on webtext-1
Jul 03 21:27:50 [2514] webtext-2.vennetics.com crmd: notice:
te_rsc_command: Initiating action 53: notify mysql_drbd_post_notify_demote_0
on webtext-1
Jul 03 21:28:04 [2514] webtext-2.vennetics.com crmd: notice:
te_rsc_command: Initiating action 49: notify mysql_drbd_pre_notify_stop_0 on
webtext-1
Jul 03 21:28:21 [2514] webtext-2.vennetics.com crmd: notice:
te_rsc_command: Initiating action 48: notify mysql_drbd_pre_notify_stop_0 on
webtext-1
Jul 03 21:28:40 [2514] webtext-2.vennetics.com crmd: notice:
te_rsc_command: Initiating action 7: stop mysql_drbd_stop_0 on webtext-1
Jul 03 21:45:32 [2514] webtext-2.vennetics.com crmd: notice:
te_rsc_command: Initiating action 4: monitor mysql_fs_monitor_0 on webtext-2
(local)
Jul 03 21:45:42 [2514] webtext-2.vennetics.com crmd: notice:
te_rsc_command: Initiating action 3: probe_complete probe_complete on
webtext-2 (local) - no waiting
Jul 03 21:46:58 [2514] webtext-2.vennetics.com crmd: notice:
te_rsc_command: Initiating action 4: monitor mysql_drbd:0_monitor_0 on
webtext-2 (local)
Jul 03 21:47:23 [2514] webtext-2.vennetics.com crmd: info:
process_graph_event: Detected action (88.4)
mysql_drbd_monitor_0.666=unknown error: failed
Jul 03 21:47:23 [2514] webtext-2.vennetics.com crmd: notice:
te_rsc_command: Initiating action 3: probe_complete probe_complete on
webtext-2 (local) - no waiting
Jul 03 21:47:29 [2514] webtext-2.vennetics.com crmd: notice:
te_rsc_command: Initiating action 42: notify mysql_drbd_pre_notify_stop_0 on
webtext-2 (local)
Jul 03 21:47:44 [2514] webtext-2.vennetics.com crmd: notice:
te_rsc_command: Initiating action 43: notify mysql_drbd_pre_notify_stop_0 on
webtext-2 (local)
Jul 03 21:47:47 [2514] webtext-2.vennetics.com crmd: notice:
te_rsc_command: Initiating action 1: stop mysql_drbd_stop_0 on webtext-2
(local)
Jul 03 22:06:16 [19207] webtext-2.vennetics.com crmd: info:
services_os_action_execute: Managed Filesystem_meta-data_0 process 19326
exited with rc=0
Jul 03 22:06:18 [19207] webtext-2.vennetics.com crmd: info:
services_os_action_execute: Managed drbd_meta-data_0 process 19332 exited
with rc=0
Jul 03 22:06:54 [19207] webtext-2.vennetics.com crmd: info:
services_os_action_execute: Managed drbd_meta-data_0 process 19396 exited
with rc=0
Jul 03 22:52:22 [19207] webtext-2.vennetics.com crmd: info:
services_os_action_execute: Managed drbd_meta-data_0 process 21326 exited
with rc=0
# crm_mon --inactive --group-by-node -1
Last updated: Wed Jul 3 23:17:32 2013
Last change: Wed Jul 3 22:51:38 2013 via cibadmin on webtext-2
Stack: cman
Current DC: webtext-1 - partition with quorum
Version: 1.1.10-1.el6-2718638
2 Nodes configured, unknown expected votes
5 Resources configured.
Node webtext-1: standby
Node webtext-2: online
mysql_drbd (ocf::linbit:drbd): Started
Inactive resources:
Master/Slave Set: mysql_ms [mysql_drbd]
Slaves: [ webtext-2 ]
Stopped: [ webtext-1 ]
Resource Group: mysql
mysql_fs (ocf::heartbeat:Filesystem): Stopped
mysql_init (lsb:mysql): Stopped
jboss_init (lsb:jboss): Stopped
Failed actions:
mysql_drbd_monitor_130000 (node=webtext-1, call=349, rc=1, status=Timed
Out, last-rc-change=Wed Jul 3 22:44:49 2013
, queued=0ms, exec=0ms
): unknown error
mysql_drbd_promote_0 (node=webtext-2, call=189, rc=1, status=Timed Out,
last-rc-change=Wed Jul 3 22:51:25 2013
, queued=20980ms, exec=13ms
): unknown error
# crm configure show
node webtext-1 \
attributes standby="on"
node webtext-2 \
attributes standby="off"
primitive jboss_init lsb:jboss \
op monitor interval="40" timeout="120" start-delay="320" \
op start interval="0" timeout="320" \
op stop interval="0" timeout="320" \
meta target-role="Started"
primitive mysql_drbd ocf:linbit:drbd \
params drbd_resource="r0" \
op monitor interval="130" role="Master" \
op monitor interval="140" role="Slave" \
op stop interval="0" timeout="240" \
op start interval="0" timeout="320" \
meta target-role="Started"
primitive mysql_fs ocf:heartbeat:Filesystem \
params device="/dev/drbd/by-res/r0" directory="/drbd0/" fstype="ext4" \
op stop interval="0" timeout="120" \
op start interval="0" timeout="120" \
op monitor interval="40" timeout="120" \
meta is-managed="true"
primitive mysql_init lsb:mysql \
op stop interval="0" timeout="320" \
op start interval="0" timeout="320" \
meta is-managed="true"
group mysql mysql_fs mysql_init \
meta target-role="Started"
ms mysql_ms mysql_drbd \
meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true" is-managed="true"
location drbd-fence-by-handler-r0-mysql_ms mysql_ms \
rule $id="drbd-fence-by-handler-r0-rule-mysql_ms" $role="Master" -inf:
#uname ne webtext-2.vennetics.com
colocation jboss_with_mysql inf: jboss_init mysql
colocation mysql_on_drbd inf: mysql mysql_ms:Master
order jboss_after_mysql inf: mysql_init jboss_init
order mysql_after_drbd inf: mysql_ms:promote mysql:start
property $id="cib-bootstrap-options" \
dc-version="1.1.10-1.el6-2718638" \
cluster-infrastructure="cman" \
stonith-enabled="false" \
last-lrm-refresh="1372884412" \
no-quorum-policy="ignore"
#vi /etc/drbd.conf
global {
usage-count yes;
}
resource r0 {
# write IO is reported as completed if it has reached both local
# and remote disk
protocol C;
net {
# set up peer authentication
cram-hmac-alg sha1;
shared-secret "test";
}
startup {
# wait for connection timeout - boot process blocked
# until DRBD resources are connected
# ----- wfc-timeout 30;
# WFC timeout if peer was outdated
# ----- outdated-wfc-timeout 20;
# WFC timeout if this node was in a degraded cluster (i.e. only had one
# node left)
# ----- degr-wfc-timeout 30;
}
disk {
fencing resource-only;
}
handlers {
fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
}
# first node
on webtext-1.vennetics.com {
# DRBD device
device /dev/drbd0;
# backing store device
disk /dev/vg_webtext1_02/lv_drbd0;
# IP address of node, and port to listen on
address 10.87.79.218:7788;
# use internal meta data (don't create a filesystem before
# you create metadata!)
meta-disk internal;
}
# second node
on webtext-2.vennetics.com {
# DRBD debice
device /dev/drbd0;
# backing store device
disk /dev/vg_webtext2_02/lv_drbd0;
# IP address of node, and port to listen on
address 10.87.79.219:7788;
# use internal meta data (don't create a filesystem before
# you create metadata!)
meta-disk internal;
}
}
vi /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="1" name="webtext_cluster">
<clusternodes>
<clusternode name="webtext-1" nodeid="1">
<fence>
<method name="pcmk-redirect">
<device name="pcmk" port="webtext-1"/>
</method>
</fence>
</clusternode>
<clusternode name="webtext-2" nodeid="2">
<fence>
<method name="pcmk-redirect">
<device name="pcmk" port="webtext-2"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice agent="fence_pcmk" name="pcmk"/>
</fencedevices>
<cman expected_votes="1" two_node="1"/>
<logging to_syslog="yes" to_logfile="yes" syslog_facility="daemon"
syslog_priority="info" logfile_priority="info">
<logging_daemon name="qdiskd"
logfile="/var/log/cluster/qdiskd.log" logfile_priority="debug"/>
<logging_daemon name="fenced"
logfile="/var/log/cluster/fenced.log" logfile_priority="debug"/>
<logging_daemon name="dlm_controld"
logfile="/var/log/cluster/dlm_controld.log"
logfile_priority="debug"/>
<logging_daemon name="gfs_controld"
logfile="/var/log/cluster/gfs_controld.log"
logfile_priority="debug"/>
<logging_daemon name="corosync"
logfile="/var/log/cluster/corosync.log" logfile_priority="debug"/>
</logging>
</cluster>
# vi /etc/sysconfig/pacemaker
# For non-systemd based systems, prefix export to each enabled line
# Turn on special handling for CMAN clusters in the init script
# Without this, fenced (and by inference, cman) cannot reliably be made to shut
down
PCMK_STACK=cman
#==#==# Variables that control logging
# Enable debug logging globally or per-subsystem
# Multiple subsystems may me listed separated by commas
PCMK_debug=crmd,pengine,cib,stonith-ng,attrd,pacemakerd
# rpm -qa | grep pacemaker
pacemaker-cli-1.1.10-1.el6.x86_64
pacemaker-libs-1.1.10-1.el6.x86_64
pacemaker-cluster-libs-1.1.10-1.el6.x86_64
pacemaker-libs-devel-1.1.10-1.el6.x86_64
pacemaker-remote-1.1.10-1.el6.x86_64
pacemaker-cts-1.1.10-1.el6.x86_64
pacemaker-debuginfo-1.1.10-1.el6.x86_64
pacemaker-1.1.10-1.el6.x86_64
# rpm -qa | grep cman
cman-3.0.12.1-49.el6.x86_64
# rpm -qa | grep coro
corosync-1.4.1-15.el6_4.1.x86_64
corosynclib-1.4.1-15.el6_4.1.x86_64
corosynclib-devel-1.4.1-15.el6_4.1.x86_64
# rpm -qa | grep resource-agents
resource-agents-3.9.2-21.el6.x86_64
# rpm -qa | grep libqb
libqb-devel-0.14.2-3.el6.x86_64
libqb-0.14.2-3.el6.x86_64
# rpm -qa | grep drbd
drbd84-utils-8.4.2-1.el6.elrepo.x86_64
kmod-drbd84-8.4.2-1.el6_3.elrepo.x86_64
# ifconfig (webtext-2)
eth0 Link encap:Ethernet HWaddr 52:54:00:65:EC:27
inet addr:10.87.79.217 Bcast:10.87.79.255 Mask:255.255.255.0
inet6 addr: fe80::5054:ff:fe65:ec27/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:116526 errors:0 dropped:0 overruns:0 frame:0
TX packets:104213 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:19340444 (18.4 MiB) TX bytes:53027494 (50.5 MiB)
Interrupt:10 Base address:0xc000
eth1 Link encap:Ethernet HWaddr 52:54:00:95:68:C1
inet addr:10.87.79.219 Bcast:10.87.79.255 Mask:255.255.255.0
inet6 addr: fe80::5054:ff:fe95:68c1/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:436 errors:0 dropped:0 overruns:0 frame:0
TX packets:14 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:25724 (25.1 KiB) TX bytes:900 (900.0 b)
Interrupt:10 Base address:0xe000
# ifconfig (webtext-1)
eth0 Link encap:Ethernet HWaddr 52:54:00:CB:9A:F4
inet addr:10.87.79.216 Bcast:10.87.79.255 Mask:255.255.255.0
inet6 addr: fe80::5054:ff:fecb:9af4/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:121593 errors:0 dropped:0 overruns:0 frame:0
TX packets:107920 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:54007733 (51.5 MiB) TX bytes:22464750 (21.4 MiB)
Interrupt:10 Base address:0xc000
eth1 Link encap:Ethernet HWaddr 52:54:00:30:07:C9
inet addr:10.87.79.218 Bcast:10.87.79.255 Mask:255.255.255.0
inet6 addr: fe80::5054:ff:fe30:7c9/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:510 errors:0 dropped:0 overruns:0 frame:0
TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:29934 (29.2 KiB) TX bytes:720 (720.0 b)
Interrupt:10 Base address:0xe000
# lvdisplay
--- Logical volume ---
LV Path /dev/vg_webtext2_02/lv_drbd0
LV Name lv_drbd0
VG Name vg_webtext2_02
LV UUID d8ATq9-XPqT-mTAZ-By3H-dEoL-SoDV-ebCJL3
LV Write Access read/write
LV Creation host, time webtext-2.vennetics.com, 2013-06-30 10:35:10 +0100
LV Status available
# open 2
LV Size 4.00 GiB
Current LE 1023
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:2
# lvdisplay
--- Logical volume ---
LV Path /dev/vg_webtext1_02/lv_drbd0
LV Name lv_drbd0
VG Name vg_webtext1_02
LV UUID 3qB7lS-zH0O-WIKC-F6nl-0cuE-2Zu9-95RkF9
LV Write Access read/write
LV Creation host, time webtext-1.vennetics.com, 2013-06-30 12:00:59 +0100
LV Status available
# open 0
LV Size 4.00 GiB
Current LE 1023
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:2
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems