Hello Wolfgang, Wolfgang Rosner: > short story before > aufs on server, nfs export as nfsroot for clients > > the symptom on the client (first login after boot): > > root@blade-008:~# la > ls: cannot open directory .: Stale NFS file handle > root@blade-008:~# cd . > root@blade-008:~# la ::: > I'd rather suspect a latency problem, since aufs mount and nfs eports are > generated within the same perl scripts. May it be that aufs mount is forked > from the script and not completed, before nfs export is called? > How can I find out? how can I avoid? Other explanations?
Is the client correctly booted? And prompted for user login? If so, I don't think the problem shoule not be latency you call, because the booted client is a strong evidence of that aufs is mounted correctly and is exported correctly. The mount procedure should be done synchronously and nfsd should be able to use it just after the completion of mount. But it is totally up to your mount(8) command. Is it ordinary one from linux-utils? If ESTALE happend in the very early stage of mounting nfsroot, then your guess ("latency problem") might be possible. As a first step, I'd suggest you to see what was done between the completion of system boot and "ls" (where ESTALE happened). In other words, is there something unusual around "getty", "login" or "~user/.profile"? This is a story on your nfs-client. And, just to make sure, the story is same to your nfs-server. Is there something unusual on your nfs-server after the completion of client boot? To investigate more, you need to find out which systemcall and which module returns ESTALE. The candidates are - open(2) in ls(1) - nfs client - nfs server - aufs on the server - branch fs in aufs The debugging method for those will be - strace - wireshark - aufs module parameter "debug" With these tools and feature, you will see the behaviour of these modules. > - layer 2 is masking among others /root/some.files and /var/log > it's mounted as "ro+wh" (not visible in /sys/fs/aufs/) Do you mean - you specified "=ro+wh" as a branch permission - but /sys/fs/aufs/si_*/br2 shows "ro" right? If so, there is something wrong. But I don't know it is related to your "latency" problem. > root@cruncher:/cluster/etc/scripts/available# exportfs -v > .... > /cluster/mp/nfsr/aufs_008 192.168.130.8 > (rw,wdelay,crossmnt,no_root_squash,no_subtree_check,fsid=158,sec=sys,rw,no_root_squash,no_all_squash) Are you setting fsid for each client with different values? Such as - fsid=158 for /cluster/mp/nfsr/aufs_008 - fsid=159 for /cluster/mp/nfsr/aufs_009 ::: J. R. Okajima ------------------------------------------------------------------------------ Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/