We have thousands of linux clients hitting netapp file servers (many
3500 series, clustered) on a local gigabit LAN. From time to time,
applications return "file not found" when attempting to automount a
directory and access a file. An example of this is a long running
process, which reads in data, processes it for hours (in which time the
filesystem is unmounted) then tries to read more data from that mount
point (which causes a "file not found" error in the application). This
occurs about 1/100th of the time.
Researching at Netapp turns up this bit by Chuck Lever (Linux NFS
contributer)
"Using the Linux NFS Client with Network Appliance Filers"
http://www.netapp.com/libr ary/tr/3183.pdf (February 2006)
page 10 says...
"Due to a bug in the mount command, the default retransmission timeout
value on Linux for NFS over TCP is quite small...To obtain standard
behavior, we strongly recommend using "timeo=600, retrans=2" explicitly
when mounting via TCP."
Our defaults (assuming man pages are correct, RedHat Enterprise Linux 3)
would be timeo=7, retrans=3, which translates to 7+14+28+56 = 105 tenths
of a second (10 seconds). It appears netapp is suggesting waiting
600+600 = 1200 tenths (120 seconds) before giving up on the mount command...
* What "bug" in the mount command do you believe NetApp is talking about?
* What do you think proper options for NFS auto/mounts would be for
extremely busy centralized NFS filers?
* What is the reference standard behavior?
Thanks,
--Greg
--
----------------------------------------------------------------------
Greg Baker 512-602-3287 (work)
[EMAIL PROTECTED] 512-602-6970 (fax)
5900 E. Ben White Blvd MS 626 512-555-1212 (info)
Austin, TX 78741
_______________________________________________
autofs mailing list
[email protected]
http://linux.kernel.org/mailman/listinfo/autofs