Hello, short story before aufs on server, nfs export as nfsroot for clients
the symptom on the client (first login after boot): root@blade-008:~# la ls: cannot open directory .: Stale NFS file handle root@blade-008:~# cd . root@blade-008:~# la total 40 drwxr-xr-x 4 root root 4096 Feb 4 2015 . drwxr-xr-x 35 root root 4096 Feb 4 2015 .. drwx------ 2 root root 4096 Jan 25 2015 .aptitude -rw-r--r-- 1 root root 907 Jan 25 2015 .bashrc -rw-r--r-- 1 root root 0 Jan 31 2015 kilroy.was.here .... partially reproducible. In don't think it's a "classical" NFS stale problem, because the exports are prepared minutes before and not changed during bootup of the clients. I'd rather suspect a latency problem, since aufs mount and nfs eports are generated within the same perl scripts. May it be that aufs mount is forked from the script and not completed, before nfs export is called? How can I find out? how can I avoid? Other explanations? ----------------------- some more details: I try to build a beowulf style cluster from old server hardware. I have a server "cruncher" and some clients ("blade0xx") attached to it. Clients are disklsess and booted by TFTP / PXE / NFS Works OK for readonly nfsroot, locks up as expected for shared rw-nfsroot. The idea now is to have aufs running on the _server_ to build individual root file systems for each client and export them via NFS. There are some HowTo's around for having aufs on the _client_ and lay ramdisk over shared readonly nfsroot. This is not what I want because I'd like to keep all config on the server, be able to keep changes between reboots and to inspect /var/log after whatver happened. So I decided to have aufs on the server - (at least) one (nfsroot) for every client. Instead of configuring aufs and nfs cumbersome and error-prone line-by-line-copy-and-edit config files, I try to build aufs and nfs "on the fly" using a perl script. "mount -t aufs" and "exportfs ..." are repeatedly called by the perl system() command in a loop. Another script translates /sys/fs/aufs/ into user readable output which reads like this: none on /cluster/mp/nfsr/aufs_008 type aufs (rw,relatime,si=b8b59f115bf2cf56) 0 rw id=64 path=/cluster/nfs/nfsroot/wheezy_cow/cow_008 1 ro id=65 path=/cluster/nfs/nfsroot/wheezy_root_config 2 ro id=66 path=/cluster/nfs/nfsroot/wheezy_root_mask 3 ro id=67 path=/cluster/nfs/nfsroot/wheezy xino: /cluster/nfs/nfsroot/wheezy_cow/cow_008/.aufs.xino - layer 3 is the basic debian installation, copied from a HD debian setup - layer 2 is masking among others /root/some.files and /var/log it's mounted as "ro+wh" (not visible in /sys/fs/aufs/) - layer 1 is planned to be filled with different configurations wich may be switchted without changing the underlaying installation just by changing the mount path - layer 0 is the copy-on-write layer, different for each client. When I inspect the different layers and the aufs on the server, it all looks as intended: la /cluster/nfs/nfsroot/wheezy_root_mask/root/ insgesamt 8 drwx------ 2 root root 4096 Jan 31 19:43 . drwxr-xr-x 7 root root 4096 Feb 1 22:11 .. -r--r--r-- 119 root root 0 Jan 31 19:40 .wh..bash_history -r--r--r-- 119 root root 0 Jan 31 19:40 .wh.foobar -r--r--r-- 119 root root 0 Jan 31 19:40 .wh.hwinfo-002.out -r--r--r-- 119 root root 0 Jan 31 19:40 .wh.hwinfo-006.out -r--r--r-- 119 root root 0 Jan 31 19:40 .wh.hwinfo-008.out -r--r--r-- 119 root root 0 Jan 31 19:40 .wh.hwinfo.out la /cluster/nfs/nfsroot/wheezy_root_config/root/ insgesamt 8 drwxr-xr-x 2 root root 4096 Jan 31 19:57 . drwxr-xr-x 5 root root 4096 Feb 1 22:11 .. -rw-r--r-- 1 root root 0 Jan 31 19:57 kilroy.was.here la /cluster/mp/nfsr/aufs_008/root/ insgesamt 40 drwxr-xr-x 4 root root 4096 Feb 4 23:02 . drwxr-xr-x 35 root root 4096 Feb 4 23:02 .. drwx------ 2 root root 4096 Jan 25 08:33 .aptitude -rw-r--r-- 1 root root 907 Jan 25 12:31 .bashrc -rw-r--r-- 1 root root 0 Jan 31 19:57 kilroy.was.here -rw------- 1 root root 66 Jan 25 13:02 .lesshst -rw-r--r-- 1 root root 140 Nov 19 2007 .profile drwx------ 2 root root 4096 Jan 30 22:22 .ssh -rw------- 1 root root 4769 Jan 25 12:31 .viminfo -rw------- 1 root root 404 Feb 4 23:02 .Xauthority root@cruncher:/cluster/etc/scripts/available# exportfs -v .... /cluster/mp/nfsr/aufs_008 192.168.130.8 (rw,wdelay,crossmnt,no_root_squash,no_subtree_check,fsid=158,sec=sys,rw,no_root_squash,no_all_squash) ... Individual nfsroot are mapped by IP-specific PXE-config to each client. The problem is not completely reproducible, but it appears that mid-range client counts are showing it, while low and high numbers did not. If my suspicion was right, that I export the aufs before it's completely built, the best thing to be done were to check for completing before exportfs is called. How could I do this? Would it help to combine both commands on a single command line, called by perl system(), like "mount -t aufs ...... ; exportfs /my/aufs/mount ...." Is the mount blocking, or can it be configured to be? http://perldoc.perl.org/functions/system.html says.... system() Does exactly the same thing as exec LIST , except that a fork is done first and the parent process waits for the child process to exit. and "man mount" says: .... Adding the -F option will make mount fork, .... But I do not use mount -F .... hm...... A workaround could be to put aufs mounts and exports into different loops and put some delay between. How much should this be? ================= further system details Debian wheezy on all machines, server has a recent "experimental" kernel. Linux cruncher 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt2-1 (2014-12-08) x86_64 GNU/Linux Linux blade-008.crunchnet.rosner.lokal 3.2.0-4-amd64 #1 SMP Debian 3.2.63-2+deb7u2 x86_64 GNU/Linux relevant debian packages at server: ii aufs-tools 1:3.0+20120411-2 ii libnfsidmap2:amd64 0.25-4 ii nfs-common 1:1.2.8-9 ii nfs-kernel-server 1:1.2.8-9 -- Sincerely Wolfgang Rosner ------------------------------------------------------------------------------ Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/