In message <[EMAIL PROTECTED]>, "J. Bruce Fields" writes: > On Mon, Jan 21, 2008 at 05:08:28PM -0500, bfields wrote: > > On Mon, Jan 21, 2008 at 03:28:51PM -0500, Erez Zadok wrote: > > > > > > Here you go. See the tcpdump in here: > > > > > > http://agora.fsl.cs.sunysb.edu/tmp/nfs/ > > > > > > I captured it on an x86_64 machine using > > > > > > tcpdump -s 0 -i lo -w tcpdump2 > > > > > > And it shows near the very end the ESTALE error. > > > > Yep, thanks! So frame 107855 has the MNT reply that returns the > > filehandle in question, which is used in an ACCESS call in frame 107855 > > that gets an ESTALE. Looks like an unhappy server! > > > > > Do you think this could be related to nfs-utils? I find that I can easily > > > trigger this problem on an FC7 machine with nfs-utils-1.1.0-4.fc7 (within > > > 10-30 runs of the above loop); but so far I cannot trigger the problem on > > > an > > > FC6 machine with nfs-utils-1.0.10-14.fc6 (even after 300+ runs of the > > > above > > > loop). > > > > Yes, it's quite likely, though on a quick skim through the git logs I > > don't see an obviously related commit... > > It might help to turn on rpc cache debugging: > > echo 2048 >/proc/sys/sunrpc/rpc_debug > > and then capture the contents of the /proc/net/rpc/*/content files just > after the failure. > > Possibly even better, though it'll produce a lot of stuff: > > strace -p `pidof rpc.mountd` -s4096 -otmp > > and then pass along "tmp".
You can find both an strace and content files in http://agora.fsl.cs.sunysb.edu/tmp/nfs/ > And then of course if the regression is in nfs-utils then there's always > a git-bisect as the debugging tool of last-resort: assuming you can > reproduce the same regression between nfs-utils-1-0-10 and > nfs-utils-1-1-0 from git://linux-nfs.org/nfs-utils, then all you'd need > to do is clone that repo and do > > git bisect start > git bisect good nfs-utils-1-0-10 > git bisect bad nfs-utils-1-1-0 > > And it shouldn't take more than 8 tries. > > Sorry for not having any more clever suggestions.... > > --b. I tried to bisect nfs-utils but it didn't work. First, the latest version of nfs-utils didn't configure for me. It complained Unable to locate information required to use librpcsecgss. If you have pkgconfig installed, you might try setting environment variable PKG_CONFIG_PATH to /usr/local/lib/pkgconfig The above appears to be an error if you don't have librpcsecgss API >= 0.10. But mine, on FC7. is 0.11. (I'm using a vanilla FC7.) So I ran configure --disable-gss and was finally able to build the utils. But then, I was having mount.nfs hanging often; stracing it revealed that mount(2) was getting EACESS as if the dir wasn't exported (but exportfs said it was). I don't know if disabling gss at configure time could have resulted in these hangs. I continued and tried a few more intermediate versions in the bisection, and several of them failed to compile and/or configure and/or autogen.sh. So I don't know what else I can do; this bug may have to be fixed the hard way. (BTW, I can get you a self contained VMware image that'll show the bug, if you'd like.) Cheers, Erez. - To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html