Hello all,

I'm testing a new fileserver running OI 151a, and I've run into a problem
with an NFS4-mounted filesystem, on a Linux client, that stops responding.
This happens after running a filebench workload on the client for several
minutes. Metadata operations (ls, stat, rm, mkdir) still work, but anything
that involves file contents (e.g. cat) blocks indefinitely. To get out of
this state requires restarting the nfs service on the server (and waiting 2
minutes for recovery). The good news is that this problem is reproducible;
details below.

When the problem occurs, snoop shows the following being repeated:

0.00009 basil.cs.ru.nl -> thyme.cs.ru.nl NFS C 4 () PUTFH FH=8775 SAVEFH OPEN 
00000614 OT=NC SQ=0 CT=N AC=RW DN=N OO=3271 GETFH GETATTR 10011a 30a23a...
0.00022 thyme.cs.ru.nl -> basil.cs.ru.nl NFS R 4 () NFS4ERR_STALE_CLIENTID 
PUTFH NFS4_OK SAVEFH NFS4_OK OPEN NFS4ERR_STALE_CLIENTID 
0.00012 basil.cs.ru.nl -> thyme.cs.ru.nl NFS C 4 () RENEW CL=654ee52ddc 
0.00002 thyme.cs.ru.nl -> basil.cs.ru.nl NFS R 4 () NFS4_OK RENEW NFS4_OK 
0.00008 basil.cs.ru.nl -> thyme.cs.ru.nl NFS C 4 () PUTFH FH=8775 SAVEFH OPEN 
00000614 OT=NC SQ=0 CT=N AC=RW DN=N OO=3208 GETFH GETATTR 10011a 30a23a...
0.00019 thyme.cs.ru.nl -> basil.cs.ru.nl NFS R 4 () NFS4ERR_STALE_CLIENTID 
PUTFH NFS4_OK SAVEFH NFS4_OK OPEN NFS4ERR_STALE_CLIENTID 
0.00012 basil.cs.ru.nl -> thyme.cs.ru.nl NFS C 4 () RENEW CL=654ee52ddc 
0.00002 thyme.cs.ru.nl -> basil.cs.ru.nl NFS R 4 () NFS4_OK RENEW NFS4_OK 
... and so on. 

So the client is getting an NFS4ERR_STALE_CLIENTID on OPEN, it succesfully
renews its clientid, and then immediately gets the same error again. This
seems to be the same problem as described in
http://thread.gmane.org/gmane.linux.nfs/44449 ; the conclusion of that
thread was that this is not a bug in the Linux client.

The filebench workload that reproduces this behaviour is just the
creation/reuse of a fileset with many small files (no I/O flowops are
needed). The problem always seems to occur after about 1m NFS4 OPEN ops.
The same workload runs without issues when the client is OI 151a, or the
server is Linux, or over NFS3. I don't think it is hardware related,
because I get the same behaviour with Xen PV domains.

Is this a known problem, or should I report it as a bug? Is there anything
else I can do to help debug this?

Regards,

Kasper Brink



     Steps to reproduce
     ==================

# On SERVER:

# (Ramdisk-based pool is fastest, but disk-based works too)
ramdiskadm -a tempdisk 256m 
zpool create temppool /dev/ramdisk/tempdisk
zfs set sharenfs=rw=$CLIENT,root=$CLIENT temppool


# On CLIENT:

# I used Debian Squeeze (6.0.3), but I expect other distros will work as well.
# uname -a : Linux basil 2.6.32-5-xen-amd64 #1 SMP Mon Oct 3 07:53:54 UTC 2011 
x86_64 GNU/Linux
# dpkg -l nfs-common :  nfs-common  1:1.2.2-4  NFS support files common to 
client and server

# Get filebench, either from distro, or download: 
#  http://sourceforge.net/projects/filebench/files/filebench/filebench-1.4.9.1
#  untar; ./configure && make && make install 

mkdir /mnt/temppool
mount -t nfs4 -o rw,sync,hard $SERVER:/temppool /mnt/temppool
# Check that filesystem is writeable
touch /mnt/temppool/foo

# Save nfs4test.f (below) to a file

for i in $(seq 15); do echo ===== $i $(date); filebench -f nfs4test.f; done
# The NFS4 mount should become unresponsive around iteration 8 or 9...


############################################################
# nfs4test.f   (Filebench workload)
############################################################

set $dir=/mnt/temppool
set $nfiles=128k
set $filesize=1k

define fileset name=nfs4test,path=$dir,size=$filesize,entries=$nfiles,
               filesizegamma=0,dirwidth=1000,prealloc,reuse

define process name=dummy,instances=1
{
  thread name=dummy,memsize=1m,instances=1
  {
    flowop finishoncount name=finishoncount,value=0
  }
}

set mode quit firstdone
run

############################################################


-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

Reply via email to