If you have experimented enough with NFS, I bet you have once ran into
the evil case of stale handles.
This happens if the client and server encounter a communication error,
be it on a network level or due to a problem on the nfs server.
Processes accessing the erroneous mounted file system get to the
'D'oomed state. Worse yet, efforts of figuring out what's happening,
be it `lsof`, `fuser`, you name it, get 'D'oomed as well. Moreover
considering the case where the fs was mounted with default nfs options
processes in the D state ('Uninterpretable sleep (usually IO)'
according to the ps man page), render unkillable and if you're on a
production system a reboot might be really costly.
I have been in the situation twice. During the first, honestly, I gave
up to the reboot, especially after the depression I got from the
similar situation encountered as I crawled the net for answers. The
next time it happened I was determined and here is how I got through:

Let's say you have your nfs client (192.168.1.5 clicky) has mounted an
nfs share on /var/bar from nfs server (192.168.1.3 fserve).

On fserve /etc/exports looks like:

/home/foo       clicky(rw)

On clicky you mounted the share with

mount -t nfs fserve:/home/foo /var/bar

So something goes wrong and we get the situation I described earlier.

Here is what to try:

1) Edit your /etc/hosts on clicky and change the ip of fserve to
another box on which you can export nfs ... (192.168.1.15 fserve).
Hope that you have no other nfs shares from the original fserve :)
2) Create a bogus export: mkdir /junk then edit /etc/exports to look like:
/junk    clicky(rw)
3) Now go to clicky (suffering right now) and
mount -f nfs -o remount,intr fserve:/junk /var/bar
4) now take it off:
umount -f /var/bar

This will get your processes in the D status to fly away if you kill
them. In certain cases they'll switch to the 'T' state which you can
get rid of by kill -CONT.

Just wanted to share,

--
abulyomon

www.KiLLTHeUPLiNK.com

_______________________________________________
General mailing list
[email protected]
http://mail.jolug.org/mailman/listinfo/general_jolug.org

Reply via email to