On Tue, 15 Feb 2000, Matt Herbert wrote:
>> I don't suppose you have any log file output from the client and server
>> that might explain why you couldn't mount any filesystems?
> 
> Hrrm, a quick check of the client revealed these spooky looking messages:
> 
> Feb 11 09:15:54 sublime kernel: nfs: server firewall not responding, \
> still trying

  Hmmmm... Sounds like the server and client lost communication with each
other.  NFS is a pain in the a** when that happens; it has a tendency to get
wedged and exhibit precisely the sorts of symptoms you were seeing.

  Despite the popular advice, I really recommend the "soft" option (with a
suitably large "timeo" value) when mounting NFS filesystems.  Yes, it means
that a downed NFS server will cause programs depending on it to eventually
receive an I/O error.  But I don't see what good the alternative (an endlessly
hung process) does.

  As far as adding "intr" to "hard" to allow said hung processes to be
interrupted goes:  As I see it, it just means you have to manually kill the
processes instead of them getting an error on file I/O.  And if the right
system processes are deadlocked because of an NFS error, you may not be able
to login to kill the hung processes.

  I'd sooner have an I/O error in some programs then a hung system any day.

  (Of course, this is all on the client side.)
  
> Now that I think about it, I had just recently switched over my domain to
> linux.bogus and removed all the entries from my /etc/hosts file (to be
> served by named) hmmmm... 

  Did you do that while NFS was running?  NFS doesn't handle the unexpected
very well.  I really wish there was a better standard for file sharing under
Unix.

> I didn't run 'netfs stop'.  (redhat 6.0 doesn't have an nfslock)

  I recommend doing that on all clients, too, before you restart the NFS
server.  I know NFS is supposed to be stateless and all that, but I find NFS
servers (on Linux, anyway) tend to keep quite a bit of state around.  :-)

> I used netstat -ap (but I like the tuw, that's pretty handy).  I believe nfs
> uses port 2049, which was definitely not listed in the output.

  NFS doesn't use anything automatically; it uses Sun RPC which is handled by
the portmapper daemon, which claims TCP and UDP ports 111 on my system.  I
believe that portmap allocates regular user ports to actually handle the RPC
requests it gets.  So looking for particular port numbers doesn't help; you
need to look at what is using them.  Specifically:

        portmap
        rpc.mountd
        rpc.nfsd
        rpc.rquotad
        rpc.lockd
        rpc.statd

and probably anything else that begins with "rpc." too.  :-)

>>   Did you try a "ps aux" to see what was running?
> 
> Sure did. 

  Do you remember if any of the above showed up?

> If this happens again in six months I'll figure it out instead of relying
> on the hail mary reboot ;)

  Yah, at this point, we're pretty much examining the barn door after the
horse has wandered off, but as you say, if there is a next time, it may be
useful.  :-)

-- 
Ben Scott
[EMAIL PROTECTED]





**********************************************************
To unsubscribe from this list, send mail to
[EMAIL PROTECTED] with the following text in the
*body* (*not* the subject line) of the letter:
unsubscribe gnhlug
**********************************************************

Reply via email to