--- On Wed, 2/3/10, Rick Macklem <[email protected]> wrote:

> From: Rick Macklem <[email protected]>
> Subject: Re: Zombie NFS writing from FreeBSD clients to FreeBSD 8.0 server 
> with ZFS
> To: "alan bryan" <[email protected]>
> Date: Wednesday, February 3, 2010, 8:02 AM
> 
> 
> On Tue, 2 Feb 2010, alan bryan wrote:
> 
> > I've tried different network driver igb->em,
> UDP->TCP for NFS, enabling NFS locking on the
> server/clients (lockd, statd).
> 
> > I'm out of ideas so hoping this tcpdump sheds light on
> how it's getting stuck in this loop.
> 
> You could try the experimental server, just to see if that
> has any effect.
> Either set nfsv4_server_enable="YES" or add "-e" to both
> nfs_server_flags
> and mountd_flags.
> 
> Note that the server will handle NFSv3, so you don't need
> to use NFSv4
> mounts.
> 
> rick
> 

Thanks - I might give that a try.

This was only initially happening on our production stack which made it 
difficult to try things to troubleshoot.  I've since been able to get it to 
happen on our dev stack too.

Basically - I have about 70 mounts from the clients. 70 or so separate ZFS 
filesystems each exported via sharenfs.  This appears to work well at first.  
After some traffic and some time (less than a day) the zombie writes start 
occuring.  So, on dev we enabled dtrace (not at all familiar with it 
unfortunately) and tried to get this to happen.  When it did happen we could 
see some patterns to the calls which matches up to the repeating conversations 
witnessed in the tcpdumps.  zpool iostat when this is occuring is showing 
nothing being written to the disks.  So, it appears that the client is 
requesting a write, NFS takes the request, asks ZFS which is replying with some 
error (from it's cache?) and then back to the client again.  So, I'm starting 
to lean to this being more of a ZFS issue than an NFS one but I'm still not 
sure.

We've read the recommendations about disabling the ZIL for ZFS/NFS and that 
sounds a bit scary.  We've bought some Intel X25-E SSDs to mirror for a log 
device to add to the pool instead to see if that makes any sort of difference.  
(the thinking here is that this is now appearing like it might be a ZFS issue 
and that the speed of the SSDs plus the different code path in dealing with a 
dedicated log device might help us avoid the issue).

So, if the SSDs don't change the behavior I may give the experimental NFS 
server a try to see if it helps.

Thanks,
Alan



      
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[email protected]"

Reply via email to