In message <[EMAIL PROTECTED]>, "J. Bruce Fields" writes:
> On Mon, Jan 21, 2008 at 05:08:28PM -0500, bfields wrote:
> > On Mon, Jan 21, 2008 at 03:28:51PM -0500, Erez Zadok wrote:
> > > 
> > > Here you go.  See the tcpdump in here:
> > > 
> > >   http://agora.fsl.cs.sunysb.edu/tmp/nfs/
> > > 
> > > I captured it on an x86_64 machine using
> > > 
> > >   tcpdump -s 0 -i lo -w tcpdump2
> > > 
> > > And it shows near the very end the ESTALE error.
> > 
> > Yep, thanks!  So frame 107855 has the MNT reply that returns the
> > filehandle in question, which is used in an ACCESS call in frame 107855
> > that gets an ESTALE.  Looks like an unhappy server!
> > 
> > > Do you think this could be related to nfs-utils?  I find that I can easily
> > > trigger this problem on an FC7 machine with nfs-utils-1.1.0-4.fc7 (within
> > > 10-30 runs of the above loop); but so far I cannot trigger the problem on 
> > > an
> > > FC6 machine with nfs-utils-1.0.10-14.fc6 (even after 300+ runs of the 
> > > above
> > > loop).
> > 
> > Yes, it's quite likely, though on a quick skim through the git logs I
> > don't see an obviously related commit...
> 
> It might help to turn on rpc cache debugging:
> 
>       echo 2048 >/proc/sys/sunrpc/rpc_debug
> 
> and then capture the contents of the /proc/net/rpc/*/content files just
> after the failure.
> 
> Possibly even better, though it'll produce a lot of stuff:
> 
>       strace -p `pidof rpc.mountd` -s4096 -otmp
> 
> and then pass along "tmp".

You can find both an strace and content files in

        http://agora.fsl.cs.sunysb.edu/tmp/nfs/

> And then of course if the regression is in nfs-utils then there's always
> a git-bisect as the debugging tool of last-resort: assuming you can
> reproduce the same regression between nfs-utils-1-0-10 and
> nfs-utils-1-1-0 from git://linux-nfs.org/nfs-utils, then all you'd need
> to do is clone that repo and do
> 
>       git bisect start
>       git bisect good nfs-utils-1-0-10
>       git bisect bad nfs-utils-1-1-0
> 
> And it shouldn't take more than 8 tries.
> 
> Sorry for not having any more clever suggestions....
> 
> --b.

I tried to bisect nfs-utils but it didn't work.  First, the latest version
of nfs-utils didn't configure for me.  It complained

        Unable to locate information required to use librpcsecgss.  If you
      have pkgconfig installed, you might try setting environment variable
      PKG_CONFIG_PATH to /usr/local/lib/pkgconfig

The above appears to be an error if you don't have librpcsecgss API >=
0.10.  But mine, on FC7. is 0.11.  (I'm using a vanilla FC7.)

So I ran configure --disable-gss and was finally able to build the utils.
But then, I was having mount.nfs hanging often; stracing it revealed that
mount(2) was getting EACESS as if the dir wasn't exported (but exportfs said
it was).  I don't know if disabling gss at configure time could have
resulted in these hangs.

I continued and tried a few more intermediate versions in the bisection, and
several of them failed to compile and/or configure and/or autogen.sh.

So I don't know what else I can do; this bug may have to be fixed the hard
way.  (BTW, I can get you a self contained VMware image that'll show the
bug, if you'd like.)

Cheers,
Erez.
-
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to