On Wed, Dec 15, 2010 at 02:11:37PM +0000, Alistair Veitch wrote: > Hi, > > I've got an environment of three servers running Solaris 10 x64 u9. Two of > these servers (let's call them "A" and "B") are clustered using Veritas > Cluster Server, and they NFS-share a filesystem. The share has no options > set, so it's "rw" to the world. > > The third server (let's call it "C") NFS-mounts the share from the cluster > via an entry in its dfstab. This mount also has no options set (although > we've previously tried soft and hard mounts). The contents of > /etc/default/nfs on this server has NFS_CLIENT_VERSMAX=3, otherwise all > entries are default. We've also tried forcing NFS version 4 on the mount. > > The problem is this: if we fail over the NFS share from A to B, the NFS mount > on C hangs during the failover as expected. Once the NFS server components > are up on B, the mount recovers automatically within a second or so, as > expected. However, if we perform a failback of the share from B to A, the > mount on C takes approx 7 to 8 minutes to recover automatically. During this > period, C can ping A and B, and the virtual address and name associated with > the failover NFS share. Further failovers/failbacks, if performed without > large time intervals between them, also cause the long recovery of the NFS > mount on C. If however we wait a good while (perhaps 30 mins or more) before > performing another failover test, the recovery from the hanging NFS mount on > C is quick again, back to one second or so. > > Does anyone have any ideas why the MFS mount takes so long to recover from > the hang under the scnenarios described above ?
Hi Alistair, You could try to snoop the over-the-wire communication to see what exactly is happening. HTH. -- Marcel Telka RPE, Systems _______________________________________________ nfs-discuss mailing list [email protected]
