Hello guys,
By two weeks I'm struggling with the DRBD&Pacemaker configuration in order to
have an HA NFS server
I tried all the examples google was able to display me without success
Also, I've read lots of articles on this distribution list and was not able to
end up with a working configuration either
This article is interesting enough secundary not finish synchronizing
especially this quote:
|
|
| |
secundary not finish synchronizing
|
|
|
"To be able to avoid DRBD data divergence due to cluster split-brain,
you'd need both. Stonith alone is not good enough, DRBD fencing
policies alone are not good enough. You need both."
but still not able to make it work
Now that I have expressed my feelings about the product/s :) let me summarize
my experience:
2 identical VMs with an LVM volume and a SINGLE NIC
DRBD 9.0.9
# rpm -qa|grep
drbddrbd90-utils-9.1.0-1.el7.elrepo.x86_64kmod-drbd90-9.0.9-1.el7_4.elrepo.x86_64
Pacemaker 1.1.16# rpm -qa|grep pacemaker
pacemaker-1.1.16-12.el7_4.8.x86_64pacemaker-libs-1.1.16-12.el7_4.8.x86_64pacemaker-cluster-libs-1.1.16-12.el7_4.8.x86_64pacemaker-cli-1.1.16-12.el7_4.8.x86_64
Corosync 2.4.0
# rpm -qa|grep
corosynccorosynclib-2.4.0-9.el7_4.2.x86_64corosync-2.4.0-9.el7_4.2.x86_6
DRBD resource on both nodes:# cat /etc/drbd.d/r0.res
resource r0 {net {
# fencing resource-only; fencing resource-and-stonith;}
handlers { fence-peer "/usr/lib/drbd/crm-fence-peer.9.sh";
after-resync-target "/usr/lib/drbd/crm-unfence-peer.9.sh";}
protocol C;on nfs1 { device /dev/drbd0; disk
/dev/mapper/vg_cdf-lv_cdf; address 10.200.50.21:7788; meta-disk
internal; } on nfs2 { device /dev/drbd0; disk
/dev/mapper/vg_cdf-lv_cdf; address 10.200.50.22:7788; meta-disk
internal; }}
Everything is good up until now; mounted the volume on both nodes and was able
to see how data flies
The problem occurs with the Pacemaker on top because I was not able to
configure it to have a Master and a Slave resource, only a Master and a stopped
one
Here the Pacemaker configs:
pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip=10.200.50.20
cidr_netmask=24 op monitor interval=30s
pcs cluster cib drbd_cfgpcs -f drbd_cfg resource create Data ocf:linbit:drbd
drbd_resource=r0 op monitor interval=60spcs -f drbd_cfg resource master
DataClone Data master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
notify=truepcs -f drbd_cfg constraint colocation add DataClone with ClusterIP
INFINITYpcs -f drbd_cfg constraint order ClusterIP then DataClonepcs cluster
cib-push drbd_cfg
pcs cluster cib fs_cfgpcs -f fs_cfg resource create DataFS Filesystem
device="/dev/drbd0" directory="/var/vols/itom" fstype="xfs"pcs -f fs_cfg
constraint colocation add DataFS with DataClone INFINITY
with-rsc-role=Masterpcs -f fs_cfg constraint order promote DataClone then start
DataFSpcs cluster cib-push fs_cfg
pcs cluster cib nfs_cfg pcs -f nfs_cfg resource create nfsd nfsserver
nfs_shared_infodir=/var/vols/nfsinfopcs -f nfs_cfg resource create nfscore
exportfs clientspec="*" options=rw,sync,anonuid=1999,anongid=1999,all_squash
directory=/var/vols/core fsid=1999pcs -f nfs_cfg resource create nfsdca
exportfs clientspec="*" options=rw,sync,anonuid=1999,anongid=1999,all_squash
directory=/var/vols/dca fsid=1999pcs -f nfs_cfg resource create nfsnode1
exportfs clientspec="*" options=rw,sync,anonuid=1999,anongid=1999,all_squash
directory=/var/vols/node1 fsid=1999pcs -f nfs_cfg resource create nfsnode2
exportfs clientspec="*" options=rw,sync,anonuid=1999,anongid=1999,all_squash
directory=/var/vols/node2 fsid=1999pcs -f nfs_cfg constraint order DataFS then
nfsdpcs -f nfs_cfg constraint order nfsd then nfscorepcs -f nfs_cfg constraint
order nfsd then nfsdcapcs -f nfs_cfg constraint order nfsd then nfsnode1pcs -f
nfs_cfg constraint order nfsd then nfsnode2pcs -f nfs_cfg constraint colocation
add nfsd with DataFS INFINITYpcs -f nfs_cfg constraint colocation add nfscore
with nfsd INFINITYpcs -f nfs_cfg constraint colocation add nfsdca with nfsd
INFINITYpcs -f nfs_cfg constraint colocation add nfsnode1 with nfsd INFINITYpcs
-f nfs_cfg constraint colocation add nfsnode2 with nfsd INFINITYpcs cluster
cib-push nfs_cfg
pcs stonith create nfs1_fen fence_ipmilan pcmk_host_list="nfs1"
ipaddr=100.200.50.21 login=user passwd=pass lanplus=1 cipher=1 op monitor
interval=60spcs constraint location nfs1_fen avoids nfs1pcs stonith create
nfs2_fen fence_ipmilan pcmk_host_list="nfs2" ipaddr=100.200.50.22 login=user
passwd=pass lanplus=1 cipher=1 op monitor interval=60spcs constraint location
nfs2_fen avoids nfs2
And here the status of the cluster:
# pcs statusCluster name: nfs-clusterStack: corosyncCurrent DC: nfs2 (version
1.1.16-12.el7_4.8-94ff4df) - partition with quorumLast updated: Thu Apr 26
13:31:20 2018Last change: Thu Apr 26 09:10:44 2018 by root via cibadmin on nfs1
2 nodes configured11 resources configured
Online: [ nfs1 nfs2 ]
Full list of resources:
ClusterIP (ocf::heartbeat:IPaddr2): Started nfs1 Master/Slave Set:
DataClone [Data] Masters: [ nfs1 ] Stopped: [ nfs2 ] DataFS
(ocf::heartbeat:Filesystem): Started nfs1 nfsd (ocf::heartbeat:nfsserver):
Started nfs1 nfscore (ocf::heartbeat:exportfs): Started nfs1
nfsdca (ocf::heartbeat:exportfs): Started nfs1 nfsnode1
(ocf::heartbeat:exportfs): Started nfs1 nfsnode2
(ocf::heartbeat:exportfs): Started nfs1 nfs1_fen
(stonith:fence_ipmilan): Stopped nfs2_fen (stonith:fence_ipmilan):
Stopped
Failed Actions:* nfs1_fen_start_0 on nfs2 'unknown error' (1): call=97,
status=Timed Out, exitreason='none', last-rc-change='Thu Apr 26 09:10:45
2018', queued=0ms, exec=20009ms* nfs2_fen_start_0 on nfs1 'unknown error' (1):
call=118, status=Timed Out, exitreason='none', last-rc-change='Thu Apr 26
09:11:03 2018', queued=0ms, exec=20013ms
Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd:
active/enabled
So, with the above config, I'm seeing drbd started on the "promoted" master
node with a connecting status because the "slave"'s drbd is not runningThis is
my first concern: how to instruct Pacemaker to start both drbd processes on
both hosts/VMs at the cluster startup? (kinda Master/Slave and the
synchronization to happen)(I have to manually start the drbd on the slave to
have the following resources deployed/started so no automation/resilience...etc)
My second concern is about STONITH; is this ipmilan applicable for the current
implementation? (2 VMs with a single NIC each)
Third one: how to test that this HA indeed happens; I was trying by forcing the
switch via a constraint like"pcs constraint location ClusterIP prefers
nfs2=INFINITY" or by disconnecting the NIC
If somebody may share their experience and why not, some sample configs, I'll
appreciate it. Also, any additional feedback regarding the current
configuration is more than welcome
Many thanks,Mihai
PS. Although this is a really good book, I was not able to make it work :(
PS.PS. this is just a personal assessment in order to understand the power of
these technologies _______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user