I would like some opinions on the OCF RA I wrote. I needed to support an active-active setup for NFS, and googling found me no working solution, so I put one together. I have read these list archives and various resources around the 'net when putting this together. My testing is favorable so far, but I would like to ask the experts. I wrote up a description of my solution on my blog, the RA is linked from there. I will copy the text and link in this email. I am using Heartbeat 3 and Pacemaker on CentOS 5.4.
http://ben.timby.com/?p=109 ------ I have need for an active-active NFS cluster. For review, and active-active cluster is two boxes that export two resources (one each). Each box acts as a backup for the other box’s resource. This way, both boxes actively serve clients (albeit for different NFS exports). The first problem I ran into with this setup is the nfsserver OCF resource agent that comes with Heartbeat is not suitable. This is because it works by stopping/starting the nfs server via it’s init script. For my situation, NFS will always be running, I just want to add/remove exports on failover. Adding and removing exports is fairly easy under Linux, you use the exportfs command: $ exportfs -o rw,sync,mp 192.168.1.0/24:/mnt/fs/to/export The options correspond to those you would place into /etc/exports, and the rest is the host:/path portion, also as it would go into /etc/exports. To remove an export, you specify the following: $ exportfs -u 192.168.1.0/24:/mnt/fs/to/export Therefore what I needed was an OCF RA that managed NFS exports using exportfs. I wrote one and it is available at the link below. http://ben.timby.com/pub/exportfs.txt However there are two remaining issues. The first is that when you export a file system via NFS, a unique fsid is generated for that file system. The client machines that mount the exported file system use this id to generate handles to directories/files. This fsid is generated using the major/minor of the device being exported. This is a problem for me, as the device being exported is a DRBD volume with LVM on top of it. This means that when the LVM OCF RA fails over the LVM volgroup, the major/minor will change. In fact, the first device on my system had a minor of 4. This was true of both nodes. If a resource migrates, it receives the minor 4, as the existing volgroup already occupies 4. This means that the fsid will change for the exported file system and all client file handles are stale after failover. To fix this, each exported file system needs a unique fsid option passed to exportfs: $ exportfs -o rw,sync,mp,fsid=1 192.168.1.0/24:/mnt/fs/to/export Note that fsid=0 has special meaning in NFSv4, so avoid it unless you read the docs and understand it’s special use. I have taken care of this in my RA by generating a random fsid in case one is not already assigned. This random fsid is then written to the DRBD device, and used on the other node when the file system is exported. This way the fsid is both unique and persistent (remains same on other node after failover). The other problem is that the /var/lib/nfs/rmtab file needs to be synchronized. This file contains the clients whom have mounted the exported file system. Again, I handle this in my RA by saving the relevant rmtab entries onto the DRBD device, and restoring them to the other node’s rmtab file. I also remove these entries from the node on which the resource is stopped. This gives me a smooth failover of NFS from one node to the other and back again. To use my RA, simply install it onto your cluster nodes at: /usr/lib/ocf/resources.d/custom/exportfs Then you can create a resource using that RA, it requires three parameters. 1. exportfs_dir - the directory to export. 2. exportfs_clientspec - the client specification to export to (i.e. 192.168.1.0/24). 3. exportfs_options - the options as you would specify in /etc/exports. If you provide an fsid in the exportfs_options param, that value will be honored, the random fsid is only generated when fsid is absent. This seems to work perfectly on my cluster running CentOS 5.4, I tested using an Ubuntu 9.10 client. ** Update ** I posted a new version of the OCF RA. The problem being that it was only backing up rmtab when the resource is being stopped. Needless to say, this only covers the graceful failover scenario, if the service dies, the backup is never made. I have remedied this by spawning a process that continually backs up rmtab. This process is then killed when the resource is stopped. This should cover resource failures as well as graceful failovers. ------ _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
