Hello,

short story before
aufs on server, nfs export as nfsroot for clients

the symptom on the client (first login after boot):

root@blade-008:~# la
ls: cannot open directory .: Stale NFS file handle
root@blade-008:~# cd .
root@blade-008:~# la
total 40
drwxr-xr-x  4 root root 4096 Feb  4  2015 .
drwxr-xr-x 35 root root 4096 Feb  4  2015 ..
drwx------  2 root root 4096 Jan 25  2015 .aptitude
-rw-r--r--  1 root root  907 Jan 25  2015 .bashrc
-rw-r--r--  1 root root    0 Jan 31  2015 kilroy.was.here
....

partially reproducible.

In don't think it's a "classical" NFS stale problem, because the exports are 
prepared minutes before and not changed during bootup of the clients.

I'd rather suspect a latency problem, since aufs mount and nfs eports are 
generated within the same perl scripts. May it be that aufs mount is forked 
from the script and not completed, before nfs export is called?
How can I find out? how can I avoid? Other explanations?

-----------------------
some more details:

I try to build a beowulf style cluster from old server hardware.

I have a server "cruncher" and some clients ("blade0xx") attached to it.
Clients are disklsess and booted by TFTP / PXE / NFS
Works OK for readonly nfsroot, locks up as expected for shared rw-nfsroot.

The idea now is to have aufs running on the _server_ to build individual root 
file systems for each client and export them via NFS. There are some HowTo's 
around for having aufs on the _client_ and lay ramdisk over shared readonly 
nfsroot.
This is not what I want because I'd like to keep all config on the server, be 
able to keep changes between reboots and to inspect /var/log after whatver 
happened.

So I decided to have aufs on the server - (at least) one (nfsroot) for every 
client.

Instead of configuring aufs and nfs cumbersome and error-prone 
line-by-line-copy-and-edit config files, I try to build aufs and nfs "on the 
fly" using a perl script. "mount -t aufs" and "exportfs ..." are repeatedly  
called by the perl system() command in a loop.

Another script translates /sys/fs/aufs/ into user readable output which reads 
like this:

none on /cluster/mp/nfsr/aufs_008 type aufs (rw,relatime,si=b8b59f115bf2cf56)
        0 rw id=64 path=/cluster/nfs/nfsroot/wheezy_cow/cow_008
        1 ro id=65 path=/cluster/nfs/nfsroot/wheezy_root_config
        2 ro id=66 path=/cluster/nfs/nfsroot/wheezy_root_mask
        3 ro id=67 path=/cluster/nfs/nfsroot/wheezy
        xino: /cluster/nfs/nfsroot/wheezy_cow/cow_008/.aufs.xino

- layer 3 is the basic debian installation, copied from a HD debian setup
- layer 2 is masking among others /root/some.files and /var/log
it's mounted as "ro+wh" (not visible in  /sys/fs/aufs/)
- layer 1 is planned to be filled with different configurations wich may be 
switchted without changing the underlaying installation just by changing the 
mount path
- layer 0 is the copy-on-write layer, different for each client.

When I inspect the different layers and the aufs on the server, it all looks 
as intended:

 la /cluster/nfs/nfsroot/wheezy_root_mask/root/
insgesamt 8
drwx------   2 root root 4096 Jan 31 19:43 .
drwxr-xr-x   7 root root 4096 Feb  1 22:11 ..
-r--r--r-- 119 root root    0 Jan 31 19:40 .wh..bash_history
-r--r--r-- 119 root root    0 Jan 31 19:40 .wh.foobar
-r--r--r-- 119 root root    0 Jan 31 19:40 .wh.hwinfo-002.out
-r--r--r-- 119 root root    0 Jan 31 19:40 .wh.hwinfo-006.out
-r--r--r-- 119 root root    0 Jan 31 19:40 .wh.hwinfo-008.out
-r--r--r-- 119 root root    0 Jan 31 19:40 .wh.hwinfo.out

 la /cluster/nfs/nfsroot/wheezy_root_config/root/
insgesamt 8
drwxr-xr-x 2 root root 4096 Jan 31 19:57 .
drwxr-xr-x 5 root root 4096 Feb  1 22:11 ..
-rw-r--r-- 1 root root    0 Jan 31 19:57 kilroy.was.here

 la /cluster/mp/nfsr/aufs_008/root/
insgesamt 40
drwxr-xr-x  4 root root 4096 Feb  4 23:02 .
drwxr-xr-x 35 root root 4096 Feb  4 23:02 ..
drwx------  2 root root 4096 Jan 25 08:33 .aptitude
-rw-r--r--  1 root root  907 Jan 25 12:31 .bashrc
-rw-r--r--  1 root root    0 Jan 31 19:57 kilroy.was.here
-rw-------  1 root root   66 Jan 25 13:02 .lesshst
-rw-r--r--  1 root root  140 Nov 19  2007 .profile
drwx------  2 root root 4096 Jan 30 22:22 .ssh
-rw-------  1 root root 4769 Jan 25 12:31 .viminfo
-rw-------  1 root root  404 Feb  4 23:02 .Xauthority

root@cruncher:/cluster/etc/scripts/available# exportfs -v
....
/cluster/mp/nfsr/aufs_008                192.168.130.8
(rw,wdelay,crossmnt,no_root_squash,no_subtree_check,fsid=158,sec=sys,rw,no_root_squash,no_all_squash)
...

Individual nfsroot are mapped by IP-specific PXE-config to each client.

The problem is not completely reproducible, but it appears that mid-range 
client counts are showing it, while low and high numbers did not.

If my suspicion was right, that I export the aufs before it's completely 
built, the best thing to be  done were to check for completing before 
exportfs is called. How could I do this? Would it help to combine both 
commands on a single command line, called by perl system(), like
"mount -t aufs ...... ; exportfs /my/aufs/mount ...."
Is the mount blocking, or can it be configured to be?


http://perldoc.perl.org/functions/system.html
says.... 
        system()
Does exactly the same thing as exec LIST , except that a fork is done first 
and the parent process waits for the child process to exit. 

and "man mount" says:
.... Adding the -F option will make mount fork, ....

But I do not use mount -F 
.... hm......

A workaround could be to put aufs mounts and exports into different loops and 
put some delay between. How much should this be?


=================
further system details
Debian wheezy on all machines, server has a recent "experimental" kernel.

Linux cruncher 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt2-1 (2014-12-08) x86_64 
GNU/Linux

Linux blade-008.crunchnet.rosner.lokal 3.2.0-4-amd64 #1 SMP Debian 
3.2.63-2+deb7u2 x86_64 GNU/Linux

relevant debian packages at server:
ii  aufs-tools                            1:3.0+20120411-2                      
 
ii  libnfsidmap2:amd64                    0.25-4                                
 
ii  nfs-common                            1:1.2.8-9                             
 
ii  nfs-kernel-server                     1:1.2.8-9                             
 


-- 
Sincerely 
Wolfgang Rosner

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/

Reply via email to