[DRBD-user] HA DRBD with Pacemaker

Mihai Lazar Fri, 27 Apr 2018 00:30:02 -0700

Hello guys,
By two weeks I'm struggling with the DRBD&Pacemaker configuration in order to 
have an HA NFS server
I tried all the examples google was able to display me without success
Also, I've read lots of articles on this distribution list and was not able to 
end up with a working configuration either
This article is interesting enough secundary not finish synchronizing 
especially this quote:


| 
| 
|  | 
secundary not finish synchronizing


 |

 |

 |


"To be able to avoid DRBD data divergence due to cluster split-brain,
you'd need both. Stonith alone is not good enough, DRBD fencing
policies alone are not good enough. You need both."
but still not able to make it work

Now that I have expressed my feelings about the product/s :) let me summarize 
my experience:
2 identical VMs with an LVM volume and a SINGLE NIC
DRBD 9.0.9
# rpm -qa|grep 
drbddrbd90-utils-9.1.0-1.el7.elrepo.x86_64kmod-drbd90-9.0.9-1.el7_4.elrepo.x86_64

Pacemaker 1.1.16# rpm -qa|grep pacemaker
pacemaker-1.1.16-12.el7_4.8.x86_64pacemaker-libs-1.1.16-12.el7_4.8.x86_64pacemaker-cluster-libs-1.1.16-12.el7_4.8.x86_64pacemaker-cli-1.1.16-12.el7_4.8.x86_64

Corosync 2.4.0
# rpm -qa|grep 
corosynccorosynclib-2.4.0-9.el7_4.2.x86_64corosync-2.4.0-9.el7_4.2.x86_6


DRBD resource on both nodes:# cat /etc/drbd.d/r0.res
resource r0 {net {
#        fencing resource-only;        fencing resource-and-stonith;}

handlers {        fence-peer      "/usr/lib/drbd/crm-fence-peer.9.sh";        
after-resync-target     "/usr/lib/drbd/crm-unfence-peer.9.sh";}
protocol C;on nfs1 {    device    /dev/drbd0;    disk      
/dev/mapper/vg_cdf-lv_cdf;    address   10.200.50.21:7788;    meta-disk 
internal;  }  on nfs2 {    device    /dev/drbd0;    disk      
/dev/mapper/vg_cdf-lv_cdf;    address   10.200.50.22:7788;    meta-disk 
internal;  }}

Everything is good up until now; mounted the volume on both nodes and was able 
to see how data flies
The problem occurs with the Pacemaker on top because I was not able to 
configure it to have a Master and a Slave resource, only a Master and a stopped 
one
Here the Pacemaker configs:

pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip=10.200.50.20 
cidr_netmask=24 op monitor interval=30s
pcs cluster cib drbd_cfgpcs -f drbd_cfg resource create Data ocf:linbit:drbd 
drbd_resource=r0 op monitor interval=60spcs -f drbd_cfg resource master 
DataClone Data master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 
notify=truepcs -f drbd_cfg constraint colocation add DataClone with ClusterIP 
INFINITYpcs -f drbd_cfg constraint order ClusterIP then DataClonepcs cluster 
cib-push drbd_cfg

pcs cluster cib fs_cfgpcs -f fs_cfg resource create DataFS Filesystem 
device="/dev/drbd0" directory="/var/vols/itom" fstype="xfs"pcs -f fs_cfg 
constraint colocation add DataFS with DataClone INFINITY 
with-rsc-role=Masterpcs -f fs_cfg constraint order promote DataClone then start 
DataFSpcs cluster cib-push fs_cfg

pcs cluster cib nfs_cfg pcs -f nfs_cfg resource create nfsd nfsserver 
nfs_shared_infodir=/var/vols/nfsinfopcs -f nfs_cfg resource create nfscore 
exportfs clientspec="*" options=rw,sync,anonuid=1999,anongid=1999,all_squash 
directory=/var/vols/core fsid=1999pcs -f nfs_cfg resource create nfsdca 
exportfs clientspec="*" options=rw,sync,anonuid=1999,anongid=1999,all_squash 
directory=/var/vols/dca fsid=1999pcs -f nfs_cfg resource create nfsnode1 
exportfs clientspec="*" options=rw,sync,anonuid=1999,anongid=1999,all_squash 
directory=/var/vols/node1 fsid=1999pcs -f nfs_cfg resource create nfsnode2 
exportfs clientspec="*" options=rw,sync,anonuid=1999,anongid=1999,all_squash 
directory=/var/vols/node2 fsid=1999pcs -f nfs_cfg constraint order DataFS then 
nfsdpcs -f nfs_cfg constraint order nfsd then nfscorepcs -f nfs_cfg constraint 
order nfsd then nfsdcapcs -f nfs_cfg constraint order nfsd then nfsnode1pcs -f 
nfs_cfg constraint order nfsd then nfsnode2pcs -f nfs_cfg constraint colocation 
add nfsd with DataFS INFINITYpcs -f nfs_cfg constraint colocation add nfscore 
with nfsd INFINITYpcs -f nfs_cfg constraint colocation add nfsdca with nfsd 
INFINITYpcs -f nfs_cfg constraint colocation add nfsnode1 with nfsd INFINITYpcs 
-f nfs_cfg constraint colocation add nfsnode2 with nfsd INFINITYpcs cluster 
cib-push nfs_cfg

pcs stonith create nfs1_fen fence_ipmilan pcmk_host_list="nfs1" 
ipaddr=100.200.50.21 login=user passwd=pass lanplus=1 cipher=1 op monitor 
interval=60spcs constraint location nfs1_fen avoids nfs1pcs stonith create 
nfs2_fen fence_ipmilan pcmk_host_list="nfs2" ipaddr=100.200.50.22 login=user  
passwd=pass lanplus=1 cipher=1 op monitor interval=60spcs constraint location 
nfs2_fen avoids nfs2


And here the status of the cluster:

# pcs statusCluster name: nfs-clusterStack: corosyncCurrent DC: nfs2 (version 
1.1.16-12.el7_4.8-94ff4df) - partition with quorumLast updated: Thu Apr 26 
13:31:20 2018Last change: Thu Apr 26 09:10:44 2018 by root via cibadmin on nfs1
2 nodes configured11 resources configured
Online: [ nfs1 nfs2 ]
Full list of resources:
 ClusterIP      (ocf::heartbeat:IPaddr2):       Started nfs1 Master/Slave Set: 
DataClone [Data]     Masters: [ nfs1 ]     Stopped: [ nfs2 ] DataFS 
(ocf::heartbeat:Filesystem):    Started nfs1 nfsd   (ocf::heartbeat:nfsserver): 
    Started nfs1 nfscore        (ocf::heartbeat:exportfs):      Started nfs1 
nfsdca (ocf::heartbeat:exportfs):      Started nfs1 nfsnode1       
(ocf::heartbeat:exportfs):      Started nfs1 nfsnode2       
(ocf::heartbeat:exportfs):      Started nfs1 nfs1_fen       
(stonith:fence_ipmilan):        Stopped nfs2_fen       (stonith:fence_ipmilan): 
       Stopped
Failed Actions:* nfs1_fen_start_0 on nfs2 'unknown error' (1): call=97, 
status=Timed Out, exitreason='none',    last-rc-change='Thu Apr 26 09:10:45 
2018', queued=0ms, exec=20009ms* nfs2_fen_start_0 on nfs1 'unknown error' (1): 
call=118, status=Timed Out, exitreason='none',    last-rc-change='Thu Apr 26 
09:11:03 2018', queued=0ms, exec=20013ms

Daemon Status:  corosync: active/enabled  pacemaker: active/enabled  pcsd: 
active/enabled

So, with the above config, I'm seeing drbd started on the "promoted" master 
node with a connecting status because the "slave"'s drbd is not runningThis is 
my first concern: how to instruct Pacemaker to start both drbd processes on 
both hosts/VMs at the cluster startup? (kinda Master/Slave and the 
synchronization to happen)(I have to manually start the drbd on the slave to 
have the following resources deployed/started so no automation/resilience...etc)

My second concern is about STONITH; is this ipmilan applicable for the current 
implementation? (2 VMs with a single NIC each)
Third one: how to test that this HA indeed happens; I was trying by forcing the 
switch via a constraint like"pcs constraint location ClusterIP prefers 
nfs2=INFINITY" or by disconnecting the NIC
If somebody may share their experience and why not, some sample configs, I'll 
appreciate it. Also, any additional feedback regarding the current 
configuration is more than welcome
Many thanks,Mihai
PS. Although this is a really good book, I was not able to make it work :(
PS.PS. this is just a personal assessment in order to understand the power of 
these technologies

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

[DRBD-user] HA DRBD with Pacemaker

Reply via email to