On 05/09/07, Anthony Menasse <[EMAIL PROTECTED]> wrote:
> Hello,
>
> We had some issues the other day with an NFS server under high load and
> clients which were attempting to automount the NFS shares timing out on
> the mount attempts. This surprised me as we use the nfs mount option
> "retry=2" in our auto.master file, which I assumed mean't clients would
> keep retrying the mount attempt for 120 seconds.
>
> I decided to setup a simple test to understand what was going on which I
> have described below.
>
> My questions are:
>
> * Should I expect the retry option to work as described in the nfs man
> page with automount?
>
> * Am I misunderstanding something about how automount  works with with
> the retry option?
>
> * Is my test method ok?
>
> I have attached the automount debug output from one of my test runs. Any
> feedback is much appreciated.
>
> Thanks
> Anthony
>
> Test setup:
> =========
>
> * Server is a xen instance  running CentOS 5. Kernel  2.6.18-8.el5xen
>
> * 2 clients running Fedora 3 with latest autofs5 package from fc6
> recompiled for our system ( autofs-5.0.1-0.rc3.33 ) behavior also seems
> to occur with  Fedora 5 latest package (autofs-4.1.4-33 ) machines using
> vanilla kernel 2.6.20.4
>
> * Client 1 is accessing the NFS share via a static mount
>
> * Client 2 is accessing the NFS share via automount with the following 
> configuration:
>
> auto.master:
>
> /film /etc/mounts/auto.film retry=1000,nfsvers=3,fg

this is an automount operation - AFIAK options here should only be
automount options - per the automount man page none of those are.
...on this note, what does automount do with those options?
- are they passed through to each mount defined in the map? (this
would be kinda cool as a way to define global options)
- are they ignored?
- if neither of above - why does automount not error or alert bad options?

>
> auto.film:
>
> testmount
> -ro,noatime,hard,intr,nfsvers=3,tcp,port=2049,rsize=32768,wsize=32768
> 10.2.0.235:/export
>
> Steps:
> ========
> 1) on the server set nfs daemon count to 1 exported an nfs share and
> restarted nfs daemon.
>
> 2) thrashed nfs share  from client 1 using dd and cat.
>
> 3) on client 2 attempt to access automounted mount point (which is
> currently not mounted) on server using cd or ls.
>
>
> Expected Results:
> =================
> The mount attempt  on client 2 should not time out for 1000 minutes
>
> Actual Result:
> ==============
> The mount attempt times out rather quickly (i forgot to measure the time
> I'm guessing it was about 60 seconds)
>
> The debug output shows the retry value is definitely getting passed to
> the mount command called by the automount daemon.
>
> Additional Tests using static mounts give the expected behavior  for
> retry. These were actually done by pausing the server instead of
> applying load to the server . (This was the original test method
> employed in the above test until I decided it might be better to thrash
> the server instead)
>
> 1) The following times out after 60 seconds:
>
> mount 10.2.0.235:/export /mnt/tmp/ -o retry=0,bg,nfsvers=3
>
> 2) The following doesn't time out for 1000 minutes (well i left it for a
> few minutes and it hadn't timed out):
>
> mount 10.2.0.235:/export /mnt/tmp/ -o retry=1000,bg,nfsvers=3
>

I found some info on patches made to linux-2.6.6 which seem to explain
some of this. The short description of the patch is "RPC: Make "major"
timeouts be of fixed length "timeo<<retrans" rather than counting the
number of retransmissions. The clock starts at the first attempt to
send each request." (more info @
http://linux-nfs.org/Linux-2.6.x/2.6.6/) For whatever reason this
hasn't been reflected in any man pages it seems.

Anyhow, from my understanding of things a mount or file operation
won't timeout or retry until a major timeout has occurred. With those
patches in place a major timeout will occur after 60 seconds (by
default) - therefore retry=0 will try once, but it won't actually
timeout until the major timeout of 60 seconds


>
> Current conclusion:
> ===============
> setting the nfs mount option retry in automount map files does not work
> as I would expect.
>
> --
> anthony menasse
> systems administrator | [EMAIL PROTECTED]
> rising sun pictures | www.rsp.com.au
> direct line +61 2 9384 4572
>
>
>
> Sep  5 09:52:43 kalel logger: START TEST 3
> Sep  5 09:53:02 kalel automount[5436]: st_expire: state 1 path /film
> Sep  5 09:53:02 kalel automount[5436]: expire_proc: exp_proc = 3083578288 path
> /film
> Sep  5 09:53:02 kalel automount[5436]: mount still busy /film
> Sep  5 09:53:02 kalel automount[5436]: expire_cleanup: got thid 3083578288
> path /film stat 0
> Sep  5 09:53:02 kalel automount[5436]: expire_cleanup: sigchld: exp 3083578288
> finished, switching from 2 to 1
> Sep  5 09:53:02 kalel automount[5436]: st_ready: st_ready(): state = 2 path
> /film
> Sep  5 09:53:03 kalel automount[5436]: handle_packet: type = 3
> Sep  5 09:53:03 kalel automount[5436]: handle_packet_missing_indirect: token
> 20, name testmount, request pid 7422
> Sep  5 09:53:03 kalel automount[5436]: attempting to mount entry
> /film/testmount
> Sep  5 09:53:03 kalel automount[5436]: lookup_mount: lookup(file): looking up
> testmount
> Sep  5 09:53:03 kalel automount[5436]: lookup_mount: lookup(file): testmount
> -> -ro,noatime,hard,intr,nfsvers=3,tcp,port=2049,rsize=32768,wsize=32768
> 10.2.0.235:/export
> Sep  5 09:53:03 kalel automount[5436]: parse_mount: parse(sun): expanded
> entry: -ro,noatime,hard,intr,nfsvers=3,tcp,port=2049,rsize=32768,wsize=32768
> 10.2.0.235:/export
> Sep  5 09:53:03 kalel automount[5436]: parse_mount: parse(sun): gathered
> options:
> retry=1000,nfsvers=3,fg,ro,noatime,hard,intr,nfsvers=3,tcp,port=2049,rsize=32768,wsize=32768
> Sep  5 09:53:03 kalel automount[5436]: parse_mount: parse(sun):
> dequote("10.2.0.235:/export") -> 10.2.0.235:/export
> Sep  5 09:53:03 kalel automount[5436]: parse_mount: parse(sun): core of entry:
> options=retry=1000,nfsvers=3,fg,ro,noatime,hard,intr,nfsvers=3,tcp,port=2049,rsize=32768,wsize=32768,
> loc=10.2.0.235:/export
> Sep  5 09:53:03 kalel automount[5436]: sun_mount: parse(sun): mounting root
> /film, mountpoint testmount, what 10.2.0.235:/export, fstype nfs, options
> retry=1000,nfsvers=3,fg,ro,noatime,hard,intr,nfsvers=3,tcp,port=2049,rsize=32768,wsize=32768
> Sep  5 09:53:03 kalel automount[5436]: mount_mount: mount(nfs): root=/film
> name=testmount what=10.2.0.235:/export, fstype=nfs,
> options=retry=1000,nfsvers=3,fg,ro,noatime,hard,intr,nfsvers=3,tcp,port=2049,rsize=32768,wsize=32768
> Sep  5 09:53:03 kalel automount[5436]: mount_mount: mount(nfs): nfs
> options="retry=1000,nfsvers=3,fg,ro,noatime,hard,intr,nfsvers=3,tcp,port=2049,rsize=32768,wsize=32768",
> nosymlink=0, ro=1
> Sep  5 09:53:04 kalel automount[5436]: mount_mount: mount(nfs): calling
> mkdir_path /film/testmount
> Sep  5 09:53:04 kalel automount[5436]: mount_mount: mount(nfs): calling mount
> -t nfs -s -o
> retry=1000,nfsvers=3,fg,ro,noatime,hard,intr,nfsvers=3,tcp,port=2049,rsize=32768,wsize=32768
> 10.2.0.235:/export /film/testmount
> Sep  5 09:53:24 kalel automount[5436]: >> mount: RPC: Timed out
> Sep  5 09:53:24 kalel automount[5436]: mount(nfs): nfs: mount failure
> 10.2.0.235:/export on /film/testmount
> Sep  5 09:53:24 kalel automount[5436]: send_fail: token = 20
> Sep  5 09:53:24 kalel automount[5436]: failed to mount /film/testmount
>
>
> _______________________________________________
> autofs mailing list
> [email protected]
> http://linux.kernel.org/mailman/listinfo/autofs
>
>

_______________________________________________
autofs mailing list
[email protected]
http://linux.kernel.org/mailman/listinfo/autofs

Reply via email to