Re: NFS server fail-over - how do you do it?

2004-06-01 Thread Andrea Venturoli
** Reply to note from adp [EMAIL PROTECTED] Mon, 31 May 2004 12:33:24 -0500


 I was thinking that 
 since NFS is udp-based, that if the primary NFS server failed, and the 
 secondary assumed the primary NFS server's IP address, that things would at 
 least return to normal (of course, any writes that had been in progress 
 would fail horribly). That doesn't seem to be the case. During a test we 
 killed the main NFS server and brought up the NFS IP as an alias on the 
 backup. Didn't work. Has anyone tried anything like this?

The idea makes me shiver, as I'm quite sure there would be data losses.

However, if you are so brave... have you tried freevrrpd?

The problem might be that clients still have that IP associated with the old MAC 
address in their tables. VRRP is a
protocol designed to handel failovers that should also deal with this, by changing the 
IP *and* the MAC address of the
card.

 bye
av.


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: NFS server fail-over - how do you do it?

2004-05-31 Thread Matthew Seaman
On Sun, May 30, 2004 at 02:43:37AM -0500, adp wrote:
 I am running a FreeBSD 4.9-REL NFS server. Once every several hours our main
 NFS server replicates everything to a backup FreeBSD NFS server. We are okay
 with the gap in time between replication. What we aren't sure about is how
 to automate the fail-over between the primary to the secondary NFS server.
 This is for a web cluster. Each client mounts several directories from the
 NFS server.
 
 Let's say that our primary NFS server dies and just goes away. What then?
 Are you periodically doing a mount or a file look-up of a mounted filesystem
 to check if your NFS server died? If so are you just unmounting and
 remounting everything using the backup NFS server?
 
 Just curious how this problem is being solved.

If you're mounting those NFS partitions read/write, then there really
isn't a good solution for this problem[1] -- you need your NFS server up
and running 24x7.

If you are NFS mounting those partitions read-only, then you can in
principle construct a fail-over system between those servers.  Some
Unix OSes let you specify a list of servers in fstab(5) (eg. Solaris)
and clients will mount from one or other of them.  Unfortunately you
can't do that with standard NFS mounts under FreeBSD.  You could try
using VRRP -- see the net/freevrrpd port for example -- but I'm not
sure how well that would work if the system failed-over in the middle
of an IO transaction.

In any case -- certainly if your NFS partitions are read/write, but
also for read-only, perhaps the best compromise is to use the
automounter amd(8) This certainly does help with the 'nightmare
filesystem' scenario, where loss of a server prevents the clients
doing anything, even rebooting cleanly.  You can create a limited and
rudimentary form of failover by using role-base hostnames in your
internal DNS -- eg nfsserv.example.com as a CNAME pointing at your
main server, and then modify the DNS when you need the failover to
occur.  It's a bit clunky and needs manual intervention, but it beats
having nothing at all.

Cheers,

Matthew 

[1] Well, I assume you haven't got the resources to set up a storage
array with multiple servers accessing the same disk sets.

-- 
Dr Matthew J Seaman MA, D.Phil.   26 The Paddocks
  Savill Way
PGP: http://www.infracaninophile.co.uk/pgpkey Marlow
Tel: +44 1628 476614  Bucks., SL7 1TH UK


pgp3LgQX3cSP5.pgp
Description: PGP signature


Re: NFS server fail-over - how do you do it?

2004-05-31 Thread Chuck Swiger
adp wrote:
One of my big problems right now is that if our primary NFS server goes down
then everything using that NFS mount locks up. If I change to the mounted
filesystem on the client then it stalls:
# pwd
/root
# cd /nfs-mount-dir
[locks]
If I try to reboot the reboot fails as well since FreeBSD can't unmount the
filesystem!?
Solaris provides mechanisms for NFS-failover for read-only NFS shares, but 
FreeBSD doesn't seem to support that.  Besides, most people seem to want to 
use read/write filesystems, which makes the former solution not very useful to 
most people's requirements.

The solution to the problem is to make very certain that your primary NFS 
server does not go down, ever, period.  Reasonable people who identify a 
mission-critical system such as a primary NFS server ought to be willing to 
spend money to get really good hardware, have a UPS, and so forth to facility 
the goal of 100% uptime.  A Sun E450 still makes a nice primary fileserver, 
although NAS solutions like a NetApp or an Auspex (not cheap!) should also be 
considered.

The other choice would be to switch from using NFS to using a distributed 
filesystem which implements fileserver redundancy, such as AFS and it's 
successor, DFS.

--
-Chuck
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: NFS server fail-over - how do you do it?

2004-05-31 Thread adp
We can live with the chance that a file write might fail as long as we can
switch over to another NFS server if the primary fails. So amd will help us
avoid the client hung issue? I will have to take a look. That is the worst
thing of all when it comes to a failed NFS server. You can't even remotely
reboot the NFS client! Someone has to power reset the damn thing. That's
bad.

On Sun, May 30, 2004 at 02:43:37AM -0500, adp wrote:
 I am running a FreeBSD 4.9-REL NFS server. Once every several hours our
main
 NFS server replicates everything to a backup FreeBSD NFS server. We are
okay
 with the gap in time between replication. What we aren't sure about is how
 to automate the fail-over between the primary to the secondary NFS server.
 This is for a web cluster. Each client mounts several directories from the
 NFS server.

 Let's say that our primary NFS server dies and just goes away. What then?
 Are you periodically doing a mount or a file look-up of a mounted
filesystem
 to check if your NFS server died? If so are you just unmounting and
 remounting everything using the backup NFS server?

 Just curious how this problem is being solved.

If you're mounting those NFS partitions read/write, then there really
isn't a good solution for this problem[1] -- you need your NFS server up
and running 24x7.

If you are NFS mounting those partitions read-only, then you can in
principle construct a fail-over system between those servers.  Some
Unix OSes let you specify a list of servers in fstab(5) (eg. Solaris)
and clients will mount from one or other of them.  Unfortunately you
can't do that with standard NFS mounts under FreeBSD.  You could try
using VRRP -- see the net/freevrrpd port for example -- but I'm not
sure how well that would work if the system failed-over in the middle
of an IO transaction.

In any case -- certainly if your NFS partitions are read/write, but
also for read-only, perhaps the best compromise is to use the
automounter amd(8) This certainly does help with the 'nightmare
filesystem' scenario, where loss of a server prevents the clients
doing anything, even rebooting cleanly.  You can create a limited and
rudimentary form of failover by using role-base hostnames in your
internal DNS -- eg nfsserv.example.com as a CNAME pointing at your
main server, and then modify the DNS when you need the failover to
occur.  It's a bit clunky and needs manual intervention, but it beats
having nothing at all.

 Cheers,

 Matthew

[1] Well, I assume you haven't got the resources to set up a storage
array with multiple servers accessing the same disk sets.

--
Dr Matthew J Seaman MA, D.Phil.   26 The Paddocks
  Savill Way
PGP: http://www.infracaninophile.co.uk/pgpkey Marlow
Tel: +44 1628 476614  Bucks., SL7 1TH UK


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: NFS server fail-over - how do you do it?

2004-05-31 Thread adp
Very useful information, thanks. We have a very stable NFS server, but I am
still working hard to put some redundancy into place. I was thinking that
since NFS is udp-based, that if the primary NFS server failed, and the
secondary assumed the primary NFS server's IP address, that things would at
least return to normal (of course, any writes that had been in progress
would fail horribly). That doesn't seem to be the case. During a test we
killed the main NFS server and brought up the NFS IP as an alias on the
backup. Didn't work. Has anyone tried anything like this?

- Original Message -
From: Chuck Swiger [EMAIL PROTECTED]
To: adp [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Monday, May 31, 2004 11:55 AM
Subject: Re: NFS server fail-over - how do you do it?


 adp wrote:
  One of my big problems right now is that if our primary NFS server goes
down
  then everything using that NFS mount locks up. If I change to the
mounted
  filesystem on the client then it stalls:
 
  # pwd
  /root
  # cd /nfs-mount-dir
  [locks]
 
  If I try to reboot the reboot fails as well since FreeBSD can't unmount
the
  filesystem!?

 Solaris provides mechanisms for NFS-failover for read-only NFS shares, but
 FreeBSD doesn't seem to support that.  Besides, most people seem to want
to
 use read/write filesystems, which makes the former solution not very
useful to
 most people's requirements.

 The solution to the problem is to make very certain that your primary NFS
 server does not go down, ever, period.  Reasonable people who identify a
 mission-critical system such as a primary NFS server ought to be willing
to
 spend money to get really good hardware, have a UPS, and so forth to
facility
 the goal of 100% uptime.  A Sun E450 still makes a nice primary
fileserver,
 although NAS solutions like a NetApp or an Auspex (not cheap!) should also
be
 considered.

 The other choice would be to switch from using NFS to using a distributed
 filesystem which implements fileserver redundancy, such as AFS and it's
 successor, DFS.

 --
 -Chuck



___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: NFS server fail-over - how do you do it?

2004-05-31 Thread Chuck Swiger
adp wrote:
We can live with the chance that a file write might fail as long as we can
switch over to another NFS server if the primary fails.
Sorry, NFS simply won't work with the model of operation you've described.
There is no way to do fallback to a secondary NFS server if the primary goes 
down when using read/write shares, nor does there exist any way to push the 
changes made to a secondary fileserver back to the primary, even if you could 
convince the clients to fail-over in the first place.

Maybe Samba/CIFS would come closer to what you want, or else WebDAV over HTTP?
--
-Chuck
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: NFS server fail-over - how do you do it?

2004-05-31 Thread Dan Nelson
In the last episode (May 31), adp said:
 Very useful information, thanks. We have a very stable NFS server,
 but I am still working hard to put some redundancy into place. I was
 thinking that since NFS is udp-based, that if the primary NFS server
 failed, and the secondary assumed the primary NFS server's IP
 address, that things would at least return to normal (of course, any
 writes that had been in progress would fail horribly). That doesn't
 seem to be the case. During a test we killed the main NFS server and
 brought up the NFS IP as an alias on the backup. Didn't work. Has
 anyone tried anything like this?

That should work, I believe.  NFS is stateless so as long as a server
starts responding to the client, it should wake up.  You may get stale
NFS handle errors on open files or ones not synched to the slave when
the master failed, but apart from that you should be okay.  Does a
tcpdump show any NFS traffic at all?

I have a port of the heartbeat program (from the badly-named
www.linux-ha.org site) that automates the IP failover part that I will
be submitting soon.  1.2.1 actually works out of the box on FreeBSD,
but 1.2.2 has problems releasing the IP when you try to move an active
server to standby.

-- 
Dan Nelson
[EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: NFS server fail-over - how do you do it?

2004-05-31 Thread scion+freebsd-questions
Couple of issues regarding failover.

1) If system B is going to take over system a's IP,
   it also needs to take it's MAC address.  Else you
   have to wait for an ARP timeout.

   Some systems (all?) perform a gratuitous arp-reply
   when an if comes up.  But some other systems ignore
   this if they already have an arp entry, or if they
   weren't asking for the arp in the first place. 

2) The failed system must be made to stay failed, else
   there is hell to pay when it comes back and finds
   another system in the bed, er, server room!

   In a main/standby scenario, this is doable with some
   simple scripting.  Any more than that and you will
   need some dynamic voting algortihm support.

   A nice thing about *real* computers is that they have
   an RS-232 console port and can be made to stay down
   with a BRK.

   I believe the PC weasel will allow that, as well.

   A remote power controller can also serve this need.

3) One argument for run-levels in init was to keep a
   system at rl 2 monitoring the primary, then go to
   rl 3 if the primary failes.

   This, of course, can be done with flat rc.d, and
   entirely without it, as well.  But it made the 
   primary/hotstandby scheme trivial to set-up.

   Regardless of where you put it and what all it calls,
   make a single script that can be run from your monitor
   app once it decides the master is gone.  It ensures the
   primary is dead, starts the server processes, and screams
   like the dickens for help.

4) NFS may be stateless, but NFS over TCP is common
   nowadays, and it isn't.  Though, I believe the
   automounter can help with that.

5) NAS serving SAN is nice if you can afford all that 
   fiber term gear.  But you can do the same with a scsi
   raid array that has two host ports.  You don't even
   need the second host port if you can change the scsi
   initiator ID of one of the hosts.  Just keep your cable
   lengths as short as you can.

6) It is generally cheaper to buy than build, unless
   you have done it before.  The devil is in the details.

   I've done it before, and I'll buy every time.
   
   Given that, a plug for some friends of mine that have
   made this work in the pri/hs mode.

   www.nssolutions.com

Cheers!
-sam
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: NFS server fail-over - how do you do it?

2004-05-31 Thread horio shoichi
On Sun, 30 May 2004 02:43:37 -0500
adp [EMAIL PROTECTED] wrote:
 I am running a FreeBSD 4.9-REL NFS server. Once every several hours our main
 NFS server replicates everything to a backup FreeBSD NFS server. We are okay
 with the gap in time between replication. What we aren't sure about is how
 to automate the fail-over between the primary to the secondary NFS server.
 This is for a web cluster. Each client mounts several directories from the
 NFS server.
 
 Let's say that our primary NFS server dies and just goes away. What then?
 Are you periodically doing a mount or a file look-up of a mounted filesystem
 to check if your NFS server died? If so are you just unmounting and
 remounting everything using the backup NFS server?
 
 Just curious how this problem is being solved.
 
 
 ___
 [EMAIL PROTECTED] mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-questions
 To unsubscribe, send any mail to [EMAIL PROTECTED]
 

Have you looked into amd (or, am-utils) ?

I haven't used its failover feature, but it certainly does have it.



horio shoichi

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: NFS server fail-over - how do you do it?

2004-05-30 Thread Mike Woods
On Sun, 30 May 2004 02:43:37 -0500
adp [EMAIL PROTECTED] wrote:

 Just curious how this problem is being solved.

I cant say i've ever looked into it myself but id susjest an easy solution would be to 
have a cron script store run every now and again to ping the servers and change the 
mounts depending on what the responce is.

also if your backup system is bespoke and can be modified you could use amd and have 
the script read stored data on nfs server availability so it can decide where to 
backup the data.

-- 
Mike Woods
IT Technician
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: NFS server fail-over - how do you do it?

2004-05-30 Thread adp
One of my big problems right now is that if our primary NFS server goes down
then everything using that NFS mount locks up. If I change to the mounted
filesystem on the client then it stalls:

# pwd
/root
# cd /nfs-mount-dir
[locks]

If I try to reboot the reboot fails as well since FreeBSD can't unmount the
filesystem!?

How do I stop this from happening?

I am using this to mount NFS filesystems:

# mount -o bg,intr,soft ...

- Original Message -
From: adp [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Sunday, May 30, 2004 2:43 AM
Subject: NFS server fail-over - how do you do it?


 I am running a FreeBSD 4.9-REL NFS server. Once every several hours our
main
 NFS server replicates everything to a backup FreeBSD NFS server. We are
okay
 with the gap in time between replication. What we aren't sure about is how
 to automate the fail-over between the primary to the secondary NFS server.
 This is for a web cluster. Each client mounts several directories from the
 NFS server.

 Let's say that our primary NFS server dies and just goes away. What then?
 Are you periodically doing a mount or a file look-up of a mounted
filesystem
 to check if your NFS server died? If so are you just unmounting and
 remounting everything using the backup NFS server?

 Just curious how this problem is being solved.



___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]