Stale NFS file handle - latency issue?

Wolfgang Rosner Thu, 05 Feb 2015 02:13:28 -0800

Hello,

short story before
aufs on server, nfs export as nfsroot for clients

the symptom on the client (first login after boot):

root@blade-008:~# la
ls: cannot open directory .: Stale NFS file handle
root@blade-008:~# cd .
root@blade-008:~# la
total 40
drwxr-xr-x 4 root root 4096 Feb 4 2015 .
drwxr-xr-x 35 root root 4096 Feb 4 2015 ..
drwx------ 2 root root 4096 Jan 25 2015 .aptitude
-rw-r--r-- 1 root root 907 Jan 25 2015 .bashrc
-rw-r--r-- 1 root root 0 Jan 31 2015 kilroy.was.here
....

partially reproducible.

In don't think it's a "classical" NFS stale problem, because the exports are
prepared minutes before and not changed during bootup of the clients.

I'd rather suspect a latency problem, since aufs mount and nfs eports are
generated within the same perl scripts. May it be that aufs mount is forked
from the script and not completed, before nfs export is called?
How can I find out? how can I avoid? Other explanations?

-----------------------
some more details:

I try to build a beowulf style cluster from old server hardware.

I have a server "cruncher" and some clients ("blade0xx") attached to it.
Clients are disklsess and booted by TFTP / PXE / NFS
Works OK for readonly nfsroot, locks up as expected for shared rw-nfsroot.

The idea now is to have aufs running on the _server_ to build individual root
file systems for each client and export them via NFS. There are some HowTo's
around for having aufs on the _client_ and lay ramdisk over shared readonly
nfsroot.
This is not what I want because I'd like to keep all config on the server, be
able to keep changes between reboots and to inspect /var/log after whatver
happened.

So I decided to have aufs on the server - (at least) one (nfsroot) for every
client.

Instead of configuring aufs and nfs cumbersome and error-prone
line-by-line-copy-and-edit config files, I try to build aufs and nfs "on the
fly" using a perl script. "mount -t aufs" and "exportfs ..." are repeatedly
called by the perl system() command in a loop.

Another script translates /sys/fs/aufs/ into user readable output which reads
like this:

none on /cluster/mp/nfsr/aufs_008 type aufs (rw,relatime,si=b8b59f115bf2cf56)
0 rw id=64 path=/cluster/nfs/nfsroot/wheezy_cow/cow_008
1 ro id=65 path=/cluster/nfs/nfsroot/wheezy_root_config
2 ro id=66 path=/cluster/nfs/nfsroot/wheezy_root_mask
3 ro id=67 path=/cluster/nfs/nfsroot/wheezy
xino: /cluster/nfs/nfsroot/wheezy_cow/cow_008/.aufs.xino

- layer 3 is the basic debian installation, copied from a HD debian setup
- layer 2 is masking among others /root/some.files and /var/log
it's mounted as "ro+wh" (not visible in /sys/fs/aufs/)
- layer 1 is planned to be filled with different configurations wich may be
switchted without changing the underlaying installation just by changing the
mount path
- layer 0 is the copy-on-write layer, different for each client.

When I inspect the different layers and the aufs on the server, it all looks
as intended:

la /cluster/nfs/nfsroot/wheezy_root_mask/root/
insgesamt 8
drwx------ 2 root root 4096 Jan 31 19:43 .
drwxr-xr-x 7 root root 4096 Feb 1 22:11 ..
-r--r--r-- 119 root root 0 Jan 31 19:40 .wh..bash_history
-r--r--r-- 119 root root 0 Jan 31 19:40 .wh.foobar
-r--r--r-- 119 root root 0 Jan 31 19:40 .wh.hwinfo-002.out
-r--r--r-- 119 root root 0 Jan 31 19:40 .wh.hwinfo-006.out
-r--r--r-- 119 root root 0 Jan 31 19:40 .wh.hwinfo-008.out
-r--r--r-- 119 root root 0 Jan 31 19:40 .wh.hwinfo.out

la /cluster/nfs/nfsroot/wheezy_root_config/root/
insgesamt 8
drwxr-xr-x 2 root root 4096 Jan 31 19:57 .
drwxr-xr-x 5 root root 4096 Feb 1 22:11 ..
-rw-r--r-- 1 root root 0 Jan 31 19:57 kilroy.was.here

la /cluster/mp/nfsr/aufs_008/root/
insgesamt 40
drwxr-xr-x 4 root root 4096 Feb 4 23:02 .
drwxr-xr-x 35 root root 4096 Feb 4 23:02 ..
drwx------ 2 root root 4096 Jan 25 08:33 .aptitude
-rw-r--r-- 1 root root 907 Jan 25 12:31 .bashrc
-rw-r--r-- 1 root root 0 Jan 31 19:57 kilroy.was.here
-rw------- 1 root root 66 Jan 25 13:02 .lesshst
-rw-r--r-- 1 root root 140 Nov 19 2007 .profile
drwx------ 2 root root 4096 Jan 30 22:22 .ssh
-rw------- 1 root root 4769 Jan 25 12:31 .viminfo
-rw------- 1 root root 404 Feb 4 23:02 .Xauthority

root@cruncher:/cluster/etc/scripts/available# exportfs -v
....
/cluster/mp/nfsr/aufs_008 192.168.130.8
(rw,wdelay,crossmnt,no_root_squash,no_subtree_check,fsid=158,sec=sys,rw,no_root_squash,no_all_squash)
...

Individual nfsroot are mapped by IP-specific PXE-config to each client.

The problem is not completely reproducible, but it appears that mid-range
client counts are showing it, while low and high numbers did not.

If my suspicion was right, that I export the aufs before it's completely
built, the best thing to be done were to check for completing before
exportfs is called. How could I do this? Would it help to combine both
commands on a single command line, called by perl system(), like
"mount -t aufs ...... ; exportfs /my/aufs/mount ...."
Is the mount blocking, or can it be configured to be?

http://perldoc.perl.org/functions/system.html
says....
system()
Does exactly the same thing as exec LIST , except that a fork is done first
and the parent process waits for the child process to exit.

and "man mount" says:
.... Adding the -F option will make mount fork, ....

But I do not use mount -F
.... hm......

A workaround could be to put aufs mounts and exports into different loops and
put some delay between. How much should this be?

=================
further system details
Debian wheezy on all machines, server has a recent "experimental" kernel.

Linux cruncher 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt2-1 (2014-12-08) x86_64
GNU/Linux

Linux blade-008.crunchnet.rosner.lokal 3.2.0-4-amd64 #1 SMP Debian
3.2.63-2+deb7u2 x86_64 GNU/Linux

relevant debian packages at server:
ii aufs-tools 1:3.0+20120411-2

ii libnfsidmap2:amd64 0.25-4

ii nfs-common 1:1.2.8-9

ii nfs-kernel-server 1:1.2.8-9

--
Sincerely
Wolfgang Rosner

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/

Stale NFS file handle - latency issue?

Reply via email to