On Tue, 15 Feb 2000, Matt Herbert wrote:
>> I don't suppose you have any log file output from the client and server
>> that might explain why you couldn't mount any filesystems?
>
> Hrrm, a quick check of the client revealed these spooky looking messages:
>
> Feb 11 09:15:54 sublime kernel: nfs: server firewall not responding, \
> still trying
Hmmmm... Sounds like the server and client lost communication with each
other. NFS is a pain in the a** when that happens; it has a tendency to get
wedged and exhibit precisely the sorts of symptoms you were seeing.
Despite the popular advice, I really recommend the "soft" option (with a
suitably large "timeo" value) when mounting NFS filesystems. Yes, it means
that a downed NFS server will cause programs depending on it to eventually
receive an I/O error. But I don't see what good the alternative (an endlessly
hung process) does.
As far as adding "intr" to "hard" to allow said hung processes to be
interrupted goes: As I see it, it just means you have to manually kill the
processes instead of them getting an error on file I/O. And if the right
system processes are deadlocked because of an NFS error, you may not be able
to login to kill the hung processes.
I'd sooner have an I/O error in some programs then a hung system any day.
(Of course, this is all on the client side.)
> Now that I think about it, I had just recently switched over my domain to
> linux.bogus and removed all the entries from my /etc/hosts file (to be
> served by named) hmmmm...
Did you do that while NFS was running? NFS doesn't handle the unexpected
very well. I really wish there was a better standard for file sharing under
Unix.
> I didn't run 'netfs stop'. (redhat 6.0 doesn't have an nfslock)
I recommend doing that on all clients, too, before you restart the NFS
server. I know NFS is supposed to be stateless and all that, but I find NFS
servers (on Linux, anyway) tend to keep quite a bit of state around. :-)
> I used netstat -ap (but I like the tuw, that's pretty handy). I believe nfs
> uses port 2049, which was definitely not listed in the output.
NFS doesn't use anything automatically; it uses Sun RPC which is handled by
the portmapper daemon, which claims TCP and UDP ports 111 on my system. I
believe that portmap allocates regular user ports to actually handle the RPC
requests it gets. So looking for particular port numbers doesn't help; you
need to look at what is using them. Specifically:
portmap
rpc.mountd
rpc.nfsd
rpc.rquotad
rpc.lockd
rpc.statd
and probably anything else that begins with "rpc." too. :-)
>> Did you try a "ps aux" to see what was running?
>
> Sure did.
Do you remember if any of the above showed up?
> If this happens again in six months I'll figure it out instead of relying
> on the hail mary reboot ;)
Yah, at this point, we're pretty much examining the barn door after the
horse has wandered off, but as you say, if there is a next time, it may be
useful. :-)
--
Ben Scott
[EMAIL PROTECTED]
**********************************************************
To unsubscribe from this list, send mail to
[EMAIL PROTECTED] with the following text in the
*body* (*not* the subject line) of the letter:
unsubscribe gnhlug
**********************************************************