[autofs] Re: [NFS] bug in linux mount? (says NetApp)

Gregory Baker Tue, 11 Jul 2006 16:35:19 -0700


Thanks Trond!


I was referring to the 'standard' comment from the netapp PDF:

"Due to a bug in the mount command, the default retransmission timeout
value on Linux for NFS over TCP is quite small...To obtain standard
behavior, we strongly recommend using "timeo=600, retrans=2" explicitly
when mounting via TCP."

And was wondering what the 'standard' was. Chuck politely pointed me toSolaris as the NFSv3 reference for 'standard'.


Thanks,

--Greg

Trond Myklebust wrote:

On Tue, 2006-07-11 at 14:00 -0500, Gregory Baker wrote:
We have thousands of linux clients hitting netapp file servers (many3500 series, clustered) on a local gigabit LAN. From time to time,applications return "file not found" when attempting to automount adirectory and access a file. An example of this is a long runningprocess, which reads in data, processes it for hours (in which time thefilesystem is unmounted) then tries to read more data from that mountpoint (which causes a "file not found" error in the application). Thisoccurs about 1/100th of the time.
Researching at Netapp turns up this bit by Chuck Lever (Linux NFScontributer)
"Using the Linux NFS Client with Network Appliance Filers"
http://www.netapp.com/libr ary/tr/3183.pdf  (February 2006)

page 10 says...
"Due to a bug in the mount command, the default retransmission timeoutvalue on Linux for NFS over TCP is quite small...To obtain standardbehavior, we strongly recommend using "timeo=600, retrans=2" explicitlywhen mounting via TCP."
Our defaults (assuming man pages are correct, RedHat Enterprise Linux 3)would be timeo=7, retrans=3, which translates to 7+14+28+56 = 105 tenthsof a second (10 seconds). It appears netapp is suggesting waiting600+600 = 1200 tenths (120 seconds) before giving up on the mount command...
No they are not. See below.
* What "bug" in the mount command do you believe NetApp is talking about?
It has nothing to do with the mount timeout: Chuck is talking about the
retransmission timeout for TCP connections 'timeo' which should indeed
be set to a high value since TCP guarantees message delivery (unlike UDP
which requires a small timeo value). Setting it too low means that you
end up spamming your server with a load of unnecessary retransmissions.

This was indeed the case for some older versions of 'mount' and also for
older versions of the am-utils/amd automounters.
* What do you think proper options for NFS auto/mounts would be forextremely busy centralized NFS filers?
Something like

mount -t nfs -ohard,timeo=600,retrans=2,rsize=32768,wsize=32768,tcp foo:/ /bar

should be a fairly safe bet. You might want to add the 'intr' flag too,
depending on how you feel about the behaviour w.r.t. pressing ^C.
* What is the reference standard behavior?
To which reference are you referring?

Cheers,
  Trond


--
----------------------------------------------------------------------
Greg Baker                                         512-602-3287 (work)
[EMAIL PROTECTED]                              512-602-6970 (fax)
5900 E. Ben White Blvd MS 626                      512-555-1212 (info)
Austin, TX 78741



_______________________________________________
autofs mailing list
[email protected]
http://linux.kernel.org/mailman/listinfo/autofs

[autofs] Re: [NFS] bug in linux mount? (says NetApp)

Reply via email to