Hello all, I'm testing a new fileserver running OI 151a, and I've run into a problem with an NFS4-mounted filesystem, on a Linux client, that stops responding. This happens after running a filebench workload on the client for several minutes. Metadata operations (ls, stat, rm, mkdir) still work, but anything that involves file contents (e.g. cat) blocks indefinitely. To get out of this state requires restarting the nfs service on the server (and waiting 2 minutes for recovery). The good news is that this problem is reproducible; details below.
When the problem occurs, snoop shows the following being repeated: 0.00009 basil.cs.ru.nl -> thyme.cs.ru.nl NFS C 4 () PUTFH FH=8775 SAVEFH OPEN 00000614 OT=NC SQ=0 CT=N AC=RW DN=N OO=3271 GETFH GETATTR 10011a 30a23a... 0.00022 thyme.cs.ru.nl -> basil.cs.ru.nl NFS R 4 () NFS4ERR_STALE_CLIENTID PUTFH NFS4_OK SAVEFH NFS4_OK OPEN NFS4ERR_STALE_CLIENTID 0.00012 basil.cs.ru.nl -> thyme.cs.ru.nl NFS C 4 () RENEW CL=654ee52ddc 0.00002 thyme.cs.ru.nl -> basil.cs.ru.nl NFS R 4 () NFS4_OK RENEW NFS4_OK 0.00008 basil.cs.ru.nl -> thyme.cs.ru.nl NFS C 4 () PUTFH FH=8775 SAVEFH OPEN 00000614 OT=NC SQ=0 CT=N AC=RW DN=N OO=3208 GETFH GETATTR 10011a 30a23a... 0.00019 thyme.cs.ru.nl -> basil.cs.ru.nl NFS R 4 () NFS4ERR_STALE_CLIENTID PUTFH NFS4_OK SAVEFH NFS4_OK OPEN NFS4ERR_STALE_CLIENTID 0.00012 basil.cs.ru.nl -> thyme.cs.ru.nl NFS C 4 () RENEW CL=654ee52ddc 0.00002 thyme.cs.ru.nl -> basil.cs.ru.nl NFS R 4 () NFS4_OK RENEW NFS4_OK ... and so on. So the client is getting an NFS4ERR_STALE_CLIENTID on OPEN, it succesfully renews its clientid, and then immediately gets the same error again. This seems to be the same problem as described in http://thread.gmane.org/gmane.linux.nfs/44449 ; the conclusion of that thread was that this is not a bug in the Linux client. The filebench workload that reproduces this behaviour is just the creation/reuse of a fileset with many small files (no I/O flowops are needed). The problem always seems to occur after about 1m NFS4 OPEN ops. The same workload runs without issues when the client is OI 151a, or the server is Linux, or over NFS3. I don't think it is hardware related, because I get the same behaviour with Xen PV domains. Is this a known problem, or should I report it as a bug? Is there anything else I can do to help debug this? Regards, Kasper Brink Steps to reproduce ================== # On SERVER: # (Ramdisk-based pool is fastest, but disk-based works too) ramdiskadm -a tempdisk 256m zpool create temppool /dev/ramdisk/tempdisk zfs set sharenfs=rw=$CLIENT,root=$CLIENT temppool # On CLIENT: # I used Debian Squeeze (6.0.3), but I expect other distros will work as well. # uname -a : Linux basil 2.6.32-5-xen-amd64 #1 SMP Mon Oct 3 07:53:54 UTC 2011 x86_64 GNU/Linux # dpkg -l nfs-common : nfs-common 1:1.2.2-4 NFS support files common to client and server # Get filebench, either from distro, or download: # http://sourceforge.net/projects/filebench/files/filebench/filebench-1.4.9.1 # untar; ./configure && make && make install mkdir /mnt/temppool mount -t nfs4 -o rw,sync,hard $SERVER:/temppool /mnt/temppool # Check that filesystem is writeable touch /mnt/temppool/foo # Save nfs4test.f (below) to a file for i in $(seq 15); do echo ===== $i $(date); filebench -f nfs4test.f; done # The NFS4 mount should become unresponsive around iteration 8 or 9... ############################################################ # nfs4test.f (Filebench workload) ############################################################ set $dir=/mnt/temppool set $nfiles=128k set $filesize=1k define fileset name=nfs4test,path=$dir,size=$filesize,entries=$nfiles, filesizegamma=0,dirwidth=1000,prealloc,reuse define process name=dummy,instances=1 { thread name=dummy,memsize=1m,instances=1 { flowop finishoncount name=finishoncount,value=0 } } set mode quit firstdone run ############################################################ ------------------------------------------- illumos-discuss Archives: https://www.listbox.com/member/archive/182180/=now RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be Modify Your Subscription: https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4 Powered by Listbox: http://www.listbox.com
