On Sun, May 30, 2004 at 02:43:37AM -0500, adp wrote: > I am running a FreeBSD 4.9-REL NFS server. Once every several hours our main > NFS server replicates everything to a backup FreeBSD NFS server. We are okay > with the gap in time between replication. What we aren't sure about is how > to automate the fail-over between the primary to the secondary NFS server. > This is for a web cluster. Each client mounts several directories from the > NFS server. > > Let's say that our primary NFS server dies and just goes away. What then? > Are you periodically doing a mount or a file look-up of a mounted filesystem > to check if your NFS server died? If so are you just unmounting and > remounting everything using the backup NFS server? > > Just curious how this problem is being solved.
If you're mounting those NFS partitions read/write, then there really isn't a good solution for this problem[1] -- you need your NFS server up and running 24x7. If you are NFS mounting those partitions read-only, then you can in principle construct a fail-over system between those servers. Some Unix OSes let you specify a list of servers in fstab(5) (eg. Solaris) and clients will mount from one or other of them. Unfortunately you can't do that with standard NFS mounts under FreeBSD. You could try using VRRP -- see the net/freevrrpd port for example -- but I'm not sure how well that would work if the system failed-over in the middle of an IO transaction. In any case -- certainly if your NFS partitions are read/write, but also for read-only, perhaps the best compromise is to use the automounter amd(8) This certainly does help with the 'nightmare filesystem' scenario, where loss of a server prevents the clients doing anything, even rebooting cleanly. You can create a limited and rudimentary form of failover by using role-base hostnames in your internal DNS -- eg nfsserv.example.com as a CNAME pointing at your main server, and then modify the DNS when you need the failover to occur. It's a bit clunky and needs manual intervention, but it beats having nothing at all. Cheers, Matthew [1] Well, I assume you haven't got the resources to set up a storage array with multiple servers accessing the same disk sets. -- Dr Matthew J Seaman MA, D.Phil. 26 The Paddocks Savill Way PGP: http://www.infracaninophile.co.uk/pgpkey Marlow Tel: +44 1628 476614 Bucks., SL7 1TH UK
pgp3LgQX3cSP5.pgp
Description: PGP signature