==> Regarding [autofs] Umount call getting stuck, hanging nfs?; Mike Marion 
<[EMAIL PROTECTED]> adds:

mmarion> Seeing some of our hosts in only one site having problems with
mmarion> hangs occurring.  Seems to be to same filer and even same paths,
mmarion> but what I see is odd.  The kernel rpciod thread is even stuck in
mmarion> state D, seemingly because the umount call is.

mmarion> i.e.  root 20302 1.2 0.0 2468 584 ?  D 12:01 2:39 /bin/umount
mmarion> //usr/local/projects/dsp/qdsp6

mmarion> root 6270 0.0 0.0 0 0 ?  D Apr28 3:17 [rpciod]

mmarion> unfortunately, once this happens, any new mounts will fail.  Can't
mmarion> even stat the path above via df.  Basically the whole NFS layer is
mmarion> stuck.

mmarion> Using autofs-4.1.4 with autofs-4.1.4-misc-fixes.patch
mmarion> autofs-4.1.4-multi-parse-fix.patch
mmarion> autofs-4.1.4-non-replicated-ping.patch patches (slight possibility
mmarion> one of the above is missing, but I'm pretty damn sure they're in
mmarion> there).

mmarion> Mounts are TCP based so I can't even use a spoofed interface to
mmarion> force a umount.

mmarion> Wondering why the extra / in the path on the umount call as well.
mmarion> Also wondering if there's something in the filer (netapp) wrong
mmarion> that's giving some kind of response to the umount that's tickling
mmarion> a bug there.  Not much I've found online yet though.

mmarion> Oh, and umount call shows socks in fd list that don't appear to
mmarion> exist anymore: :~# ls -l /proc/20302/fd total 3 dr-x------ 2 root
mmarion> root 0 May 4 15:26 .  dr-xr-xr-x 3 root root 0 May 4 12:01 ..
mmarion> lrwx------ 1 root root 64 May 4 15:26 0 -> /dev/null l-wx------ 1
mmarion> root root 64 May 4 15:26 1 -> pipe:[4528730] l-wx------ 1 root
mmarion> root 64 May 4 15:26 2 -> pipe:[4528730] :~ # socklist | grep
mmarion> 4528730 :~ #

mmarion> Problem happens on hosts using same autofs daemons with or without
mmarion> direct maps enabled.  Not really sure if it's technically an
mmarion> autofs issue (unless there's a glitch in how it's calling umount
mmarion> and it's timing there) or an NFS layer issue.

mmarion> SLES9-SP1, kernel 2.6.5-7.147-smp (from suse-9.2 updates) on
mmarion> x86_64 hosts.

Really sounds like an NFS problem.  I'd post to the NFS list, and they'll
likely ask for over-the-wire messages.

-Jeff

_______________________________________________
autofs mailing list
[email protected]
http://linux.kernel.org/mailman/listinfo/autofs

Reply via email to