Our NFS service hangs periodically.  The symptoms are that all the user
desktops and shells freeze between 2 and 17 seconds.

The following clients are connected to our Solaris NFS server:

QTY   DESCRIPTION
100+  Solaris clients NFSv3 and NFSv4
050+  Linux clients NFSv3

The hangs usually occur when the filesystems are shared on the server.
However some hangs can occur when the shares are not running.  These
problems also depend on our NFS load.

All of the clients hang at the same time, snoop shows that there are no NFS 
replies from the server.  After a few seconds, the server recovers.

We recently switched from NIS+ to ldap and after opening an SO and an 
escalation we determined that the problem seemed to be related to slow response 
times from our LDAP servers. We upgraded our V100 to a SunBlade 2000 with 
2x750MHz and this has helped the problem. We also determined that the server 
hangs more frequently when the HA-NFS monitor determines that it needs to share 
our 180 filesystems again which it does every time that there is a mount/umount 
of any filesystem on the HA-NFS server. We have taken steps to limit the number 
of times a mount/umount is done on the server.

We replaced all the netgroups in the dfstab with the list of machines that the 
netgroup represents and this helps but we still have the hangs.

We are monitoring the hangs with a simple program that creates a randomly named 
8k file then unlinks it. If the time it takes is greater than 1 second we 
report the length.

Is there any ideas/suggestions of what might be going on here?

Thanks!
 
 
This message posted from opensolaris.org

Reply via email to