Re: [Beowulf] NFSv3 client hangs - tcp v/s udp.

Joshua Baker-LePain Thu, 11 May 2006 07:51:33 -0700

On Wed, 3 May 2006 at 5:21pm, Amitoj G. Singh wrote

After upgrading from Red Hat 7.1 to Red Hat EL 4 we realized that we were
having a 1 in 10 user jobs fail because of a worker node NFS mount point
failing to respond. The NFS mount points on the worker nodes would become
unresponsive during heavy NFS I/O. A simple "netstat -t" on the
head-node showed that there were thousands of open TCP nfs sockets on the
head-node. Worker nodes that had frozen NFS mount points responded with
the following error message:


nfs_statfs: error no = 512

I had a discussion with Trond about that error on the NFS list back inDecember. That error essentially means that somebody (or something)interrupted the RPC call. E.g., you type 'df', it hangs waiting for a NFSmount, and you 'CTRL-C' it -- you'll see that error. The error messagewas planned to be removed in 2.6.16. IOW, it's a symptom, not theproblem.

We recently switched all our NFS mounts to use udp and have had no worker
nodes with failing or unresponsoive NFS mount points.

Thought would share this bit of experience with the list. Interestingly
while googling we did not find a lot of chatter about this issue.

I've seen some discussions of this on nahant-list as well as the NFS list.The problem is that it's hard to easily reproduce. If you have a testcase and a support contract, I'd heartily recommend getting in touch withRH about it directly.


--
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University
_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] NFSv3 client hangs - tcp v/s udp.

Reply via email to