So, i'm not sure if this will be helpful, but it certainly can't hurt.

Last month i set up my first HA-NFS setup, backend for frontend PHP servers. I had some previous experience with the corosync/pacemaker/heartbeat/drbd suite - a rabbitmq setup a year ago, and i set up the PHP frontend servers using the suite, before realizing that an NFS backend would be more appropriate. Suffice to say that the suite is dense, particularly the corosync crm configuration. It was a challenging week of poking, prodding, relentless google searching and discovery to get it all working. I referenced all documents others have mentioned, possibly more! The document that really got me into the 'functioning' realm was the linbit pdf 'Highly available NFS storage with DRBD and Pacemaker'. Even so, i made my own adjustments and modifications to the described setup to make it happen.

On each server i had a spare LVM partition to use for drbd. Initially, i thought that i *had to* use nested LVM's to make this work ( http://www.drbd.org/users-guide/s-nested-lvm.html ). That accounted for a large part of the frustration. Eventually i learned (by trial and mostly error) that i didn't have to use nested LVM's, i could just use them as is.

In forcing a failover ('service corosync restart'), there's 15 to 20 seconds during which the frontend web falters, waiting on the other corosync to see the failure, then remount on the other box. That's a tolerable pause for me. I could probably lower that by tweaking the various timeouts in the corosync config, but i haven't touched that yet (i've been afraid to mess with it and break it!)

Below are each of the relevant configs and info, or snippets thereof (eliding generally default stuff with '[...]'), in hopes that maybe they will be of value to someone. this is all on centos 5.8. naturally, all noted configs are identical between the two servers.

[root@nfs-a ~]# uname -a
Linux nfs-a 2.6.18-308.4.1.el5 #1 SMP Tue Apr 17 17:08:00 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
================================================================
[root@nfs-a ~]# rpm -qa|grep drbd;rpm -qa|grep corosync;rpm -qa|grep heartbeat;rpm -qa|grep pacemaker
drbd83-8.3.13-2.el5.centos
kmod-drbd83-8.3.13-1.el5.centos
corosynclib-1.2.7-1.1.el5
corosync-1.2.7-1.1.el5
heartbeat-libs-3.0.3-2.3.el5
heartbeat-3.0.3-2.3.el5
pacemaker-libs-1.0.12-1.el5.centos
pacemaker-1.0.12-1.el5.centos
[root@nfs-a ~]#
================================================================
[root@nfs-a ~]# cat /etc/hosts
127.0.0.1               nfs-a localhost.localdomain localhost
10.255.20.58            nfs-a
10.255.20.59            nfs-b
10.255.20.204           nfs-shared
================================================================
[root@nfs-a ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
[...]
/dev/drbd1             20G  537M   18G   3% /srv/nfs/html
================================================================
[root@nfs-b ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
[...]
================================================================
[root@nfs-a ~]# cat /proc/drbd
version: 8.3.13 (api:88/proto:86-96)
GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by [email protected], 2012-05-07 11:56:36
 1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
ns:3755180 nr:3535432 dw:7290612 dr:156289 al:179 bm:10 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
================================================================
[root@nfs-a ~]# cat /etc/drbd.d/global_common.conf
[...]
common {
        startup {
                wfc-timeout 0; degr-wfc-timeout 120;
        }
        disk {
                on-io-error detach;
        }
        syncer {
                rate 200M; al-extents 257;
        }
================================================================
[root@nfs-a ~]# cat /etc/drbd.d/resources.res
resource nfs-volume {
    device    /dev/drbd1;
    disk      /dev/VolGroup00/LogVol02; # the actual disk partition
    meta-disk internal;
# use canonical hostname for 'on <something>'
on nfs-a {
    address   10.255.20.58:7789;
  }
# use canonical hostname for 'on <something>'
on nfs-b {
    address   10.255.20.59:7789;
  }
}
================================================================
[root@nfs-a ~]# lvdisplay
 [...]
  --- Logical volume ---
  LV Name                /dev/VolGroup00/LogVol02
  VG Name                VolGroup00
  LV UUID                c46CsV-bOmj-6idR-I48H-7O3t-48e3-wKrfVh
  LV Write Access        read/write
  LV Status              available
  # open                 2
  LV Size                19.53 GB
  Current LE             625
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:2
================================================================
[root@nfs-a ~]# cat /etc/ha.d/ha.cf
[...]
# Communications
udpport 698 # Change last digit so that it doesn't # conflict with another instance of HA
bcast                          eth0
[...]
================================================================
In /etc/lvm/lvm.conf, *no* filtering, contrary to the guide
================================================================
[root@nfs-a ~]# cat /etc/corosync/corosync.conf
 totem {
[...]
        interface {
                ringnumber: 0
# The following three values need to be set based on your environment
                bindnetaddr: 10.255.20.0        # your local subnet
mcastaddr: 226.94.1.58 # set last octet to last octet of one of the # server's IP's. This prevents conflict with # other corosyncs that may be running on same network
                mcastport: 5405
[...]
================================================================
Finally the working crm configuration. Note that i used different mnemonics for primitives, etc, as the guide's use of all lowercase just made it all swim in front of my eyes. :)

node nfs-a
node nfs-b
primitive DAEMON lsb:nfs \
        op monitor interval="30" \
        meta target-role="Started"
primitive DRBD ocf:linbit:drbd \
        params drbd_resource="nfs-volume" \
        op start interval="0" timeout="240" \
        op stop interval="0" timeout="100" \
        op monitor interval="15" role="Master" \
        op monitor interval="30" role="Slave"
primitive EXPORT ocf:heartbeat:exportfs \
params fsid="1" directory="/srv/nfs/html" options="rw,mountpoint,no_root_squash" \ clientspec="10.255.20.0/255.255.255.0" wait_for_leasetime_on_stop="true" \
        op monitor interval="0" timeout="40" \
        op start interval="0" timeout="40" \
        meta target-role="Started"
primitive FILESYSTEM ocf:heartbeat:Filesystem \
params device="/dev/drbd1" directory="/srv/nfs/html" fstype="ext4" \
        options="nobarrier,noatime" \
        op monitor interval="10" timeout="40" \
        op start interval="0" timeout="240" \
        op stop interval="0" timeout="100" \
        meta target-role="Started"
primitive VIRTUALIP ocf:heartbeat:IPaddr2 \
params ip="10.255.20.204" broadcast="10.255.20.255" nic="eth0:1" cidr_netmask="24" \
        op monitor interval="30" \
        meta target-role="Started"
ms msDRBD DRBD \
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" \
        is-managed="true" target-role="Master"
clone clDAEMON DAEMON
clone clEXPORT EXPORT
colocation NFSALL-on-msDRBD inf: clDAEMON clEXPORT FILESYSTEM VIRTUALIP msDRBD:Master
order msDRBD-before-FILESYSTEM inf: msDRBD:promote FILESYSTEM:start
order FILESYSTEM-before-clDAEMON inf: FILESYSTEM clDAEMON
order clDAEMON-before-clEXPORT inf: clDAEMON clEXPORT
order clEXPORT-before-VIRTUALIP inf: clEXPORT VIRTUALIP
property $id="cib-bootstrap-options" \
        dc-version="1.0.12-unknown" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        stonith-enabled="false" \
        no-quorum-policy="ignore" \
        last-lrm-refresh="1338327933"
rsc_defaults $id="rsc-options" \
        resource-stickiness="200"
================================================================
[root@nfs-a etc]# crm status
============
Last updated: Wed Jun  6 17:01:29 2012
Stack: openais
Current DC: nfs-a - partition with quorum
Version: 1.0.12-unknown
2 Nodes configured, 2 expected votes
5 Resources configured.
============

Online: [ nfs-a nfs-b ]

 FILESYSTEM     (ocf::heartbeat:Filesystem):    Started nfs-a
 VIRTUALIP      (ocf::heartbeat:IPaddr2):       Started nfs-a
 Master/Slave Set: msDRBD
     Masters: [ nfs-a ]
     Slaves: [ nfs-b ]
 Clone Set: clDAEMON
     Started: [ nfs-a ]
     Stopped: [ DAEMON:1 ]
 Clone Set: clEXPORT
     Started: [ nfs-a ]
     Stopped: [ EXPORT:1 ]

--
Paul Theodoropoulos
www.anastrophe.com


_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to