If you have experimented enough with NFS, I bet you have once ran into
the evil case of stale handles.
This happens if the client and server encounter a communication error,
be it on a network level or due to a problem on the nfs server.
Processes accessing the erroneous mounted file system get to the
'D'oomed state. Worse yet, efforts of figuring out what's happening,
be it `lsof`, `fuser`, you name it, get 'D'oomed as well. Moreover
considering the case where the fs was mounted with default nfs options
processes in the D state ('Uninterpretable sleep (usually IO)'
according to the ps man page), render unkillable and if you're on a
production system a reboot might be really costly.
I have been in the situation twice. During the first, honestly, I gave
up to the reboot, especially after the depression I got from the
similar situation encountered as I crawled the net for answers. The
next time it happened I was determined and here is how I got through:Let's say you have your nfs client (192.168.1.5 clicky) has mounted an nfs share on /var/bar from nfs server (192.168.1.3 fserve). On fserve /etc/exports looks like: /home/foo clicky(rw) On clicky you mounted the share with mount -t nfs fserve:/home/foo /var/bar So something goes wrong and we get the situation I described earlier. Here is what to try: 1) Edit your /etc/hosts on clicky and change the ip of fserve to another box on which you can export nfs ... (192.168.1.15 fserve). Hope that you have no other nfs shares from the original fserve :) 2) Create a bogus export: mkdir /junk then edit /etc/exports to look like: /junk clicky(rw) 3) Now go to clicky (suffering right now) and mount -f nfs -o remount,intr fserve:/junk /var/bar 4) now take it off: umount -f /var/bar This will get your processes in the D status to fly away if you kill them. In certain cases they'll switch to the 'T' state which you can get rid of by kill -CONT. Just wanted to share, -- abulyomon www.KiLLTHeUPLiNK.com _______________________________________________ General mailing list [email protected] http://mail.jolug.org/mailman/listinfo/general_jolug.org
