We are having a problem with autofs getting stuck mounting
a number of directories on a number of machines in a short space
of time.
We have some compute nodes that use autofs to mount
home directories and system files. We are seeing a problem with
autofs hanging:
If you log into a node interactively everything mounts and works
correctly. However
If you submit a job to, say 20 nodes at once it connects via ssh into
the nodes,
which each mount the home directory /server/staff and because of the setup
also mounts server/pg server/ug server/misc server/group server/package
and a few /usr/local directories, what we then see is that a few of the nodes
will get stuck trying to mount some of these directories. If you login
to the node
interactively during this time a df command will get stuck on one of
the autofs controlled directories. You can manually mount the stuck
nfs mounts, and if you wait a few minutes everything frees up and
becomes available. Usually the stuck mount becomes available, though
occasionally you get permission denied and have to restart automount.
I have attached the errors we see at the bottom, I haven't been able
to get debug output yet. The nodes I am running in debug happen to
be working at the moment :(
A very similar configuration has worked in the past with suse 9.1 and
autofs 4.0.0. The same map configuration works on other machines
with autofs-4.1.3-114 though they are used interactively so don't
need to mount so many things so quickly.
I saw this thread that may be relevant but wasn't 100% sure?:
http://hera.kernel.org/pipermail/autofs/2005-October/002521.html
Thanks for the help.
opensuse 10 x86_64 / autofs-4.1.4-6 / Kernel 2.6.15.1
auto.master file
================
/usr/local multi file /etc/auto.usr.local ---- yp auto.usr.local
rw,intr,noquota,noac,actimeo=0
ypcat -k auto.master
====================
/data /etc/auto.data -rw,intr,noquota,nosuid
/home /etc/auto.home -rw,intr,noquota,noac,actimeo=0
/usr/local /etc/auto.usr.local -ro,intr,noquota
/etc/auto.usr.local file
========================
Apps -rw,intr,noquota master:/usr/exportlocal/&
Config -rw,intr,noquota master:/usr/exportlocal/&
Docs -rw,intr,noquota master:/usr/exportlocal/&
ypcat -k auto.usr.local
=======================
gsview -rw,intr,noquota server:/vol/vol0/unix/apps/&/$ARCH
molden -rw,intr,noquota server:/vol/vol0/unix/apps/&/$ARCH
etc...
ypcat -k auto.home
==================
server -rw,intr,quota,noac,actimeo=0 /staff
&:/vol/vol0/staff /pg &:/vol/vol0/pg /ug
&:/vol/vol0/ug /misc &:/vol/vol0/misc /group
&:/vol/vol0/group /package &:/vol/vol0/package
server2 -rw,intr,noquota &:/local/home
server3 -rw,intr,quota,noac,actimeo=0 /staff
&:/vol/vol0/staff /pg &:/vol/vol0/pg /ug
&:/vol/vol0/ug /misc &:/vol/vol0/misc /group
&:/vol/vol0/group /package &:/vol/vol0/package
etc.
Configured Mount Points:
------------------------
/usr/sbin/automount -v --timeout 3600 /data yp auto.data
rw,intr,noquota,nosuid -DARCH=x86_64.linux
/usr/sbin/automount -v --timeout 3600 /usr/local multi file
/etc/auto.usr.local -- yp auto.usr.local
rw,intr,noquota,noac,actimeo=0 -DARCH=x86_64.linux
/usr/sbin/automount -v --timeout 3600 /home yp auto.home
rw,intr,noquota,noac,actimeo=0 -DARCH=x86_64.linux
Active Mount Points:
--------------------
/usr/sbin/automount -v --timeout 3600 /data yp auto.data
rw,intr,noquota,nosuid -DARCH=x86_64.linux
/usr/sbin/automount -v --timeout 3600 /usr/local multi file
/etc/auto.usr.local -- yp auto.usr.local
rw,intr,noquota,noac,actimeo=0
/usr/sbin/automount -v --timeout 3600 /home yp auto.home
rw,intr,noquota,noac,actimeo=0 -DARCH=x86_64.linux
root 4820 0.0 0.0 12060 872 ? Ss May03 0:00
/usr/sbin/automount -v --timeout 3600 /data yp auto.data
rw,intr,noquota,nosuid -DARCH=x86_64.linux
root 4822 0.0 0.0 12056 888 ? Ss May03 0:00
/usr/sbin/automount -v --timeout 3600 /usr/local multi file
/etc/auto.usr.local -- yp auto.usr.local
rw,intr,noquota,noac,actimeo=0 -DARCH=x86_64.linux
root 4901 0.0 0.0 9984 840 ? Ss May03 0:00
/usr/sbin/automount -v --timeout 3600 /home yp auto.home
rw,intr,noquota,noac,actimeo=0 -DARCH=x86_64.linux
Some of the error messages we see are:
May 4 16:15:41 node073 automount[4879]: attempting to mount entry /home/server
May 4 16:17:27 node073 automount[4879]: attempting to mount entry
/home/server/staff
May 4 16:17:27 node073 automount[12292]: failed to mount /home/server/staff
May 4 16:17:27 node073 automount[12292]: umount_multi: no mounts
found under /home/server/staff
May 4 16:17:41 node073 automount[12288]: >> mount:
server:/vol/vol0/ug: can't read superblock
May 4 16:17:41 node073 automount[12288]: mount(nfs): nfs: mount
failure server:/vol/vol0/ug on /home/server/ug
May 4 16:18:37 node073 automount[4793]: attempting to mount entry
/usr/local/man
May 4 16:19:10 node073 automount[12313]: aquire_lock: can't lock lock
file timed out: /var/lock/autofs
May 4 16:19:10 node073 automount[12313]: mount(nfs): nfs: mount
failure server:/vol/vol0/unix/apps/usrlocal/man/x86_64.linux on
/usr/local/man
May 4 16:19:10 node073 automount[12313]: failed to mount /usr/local/man
May 4 16:19:10 node073 automount[12313]: umount_multi: no mounts
found under /usr/local/man
May 4 16:19:10 node073 automount[4793]: attempting to mount entry
/usr/local/share
May 4 16:19:41 node073 automount[12288]: >> mount:
server:/vol/vol0/pg: can't read superblock
May 4 16:19:41 node073 automount[12288]: mount(nfs): nfs: mount
failure server:/vol/vol0/pg on /home/server/pg
May 4 16:19:43 node073 automount[12314]: aquire_lock: can't lock lock
file timed out: /var/lock/autofs
May 4 16:19:43 node073 automount[12314]: mount(nfs): nfs: mount
failure server:/vol/vol0/unix/apps/usrlocal/share on /usr/local/share
May 4 16:19:43 node073 automount[12314]: failed to mount /usr/local/share
May 4 16:19:43 node073 automount[12314]: umount_multi: no mounts
found under /usr/local/share
May 4 16:19:43 node073 automount[4793]: attempting to mount entry
/usr/local/man
May 4 16:20:01 node073 /usr/sbin/cron[12318]: (root) CMD
(/usr/local/Cluster-Apps/cluster-tools/sbin/ganglia)
May 4 16:20:16 node073 automount[12316]: aquire_lock: can't lock lock
file timed out: /var/lock/autofs
May 4 16:20:16 node073 automount[12316]: mount(nfs): nfs: mount
failure server:/vol/vol0/unix/apps/usrlocal/man/x86_64.linux on
/usr/local/man
May 4 16:20:16 node073 automount[12316]: failed to mount /usr/local/man
May 4 16:20:16 node073 automount[12316]: umount_multi: no mounts
found under /usr/local/man
May 4 16:20:16 node073 automount[4793]: attempting to mount entry
/usr/local/share
May 4 16:20:49 node073 automount[12326]: aquire_lock: can't lock lock
file timed out: /var/lock/autofs
May 4 16:20:49 node073 automount[12326]: mount(nfs): nfs: mount
failure server:/vol/vol0/unix/apps/usrlocal/share on /usr/local/share
May 4 16:20:49 node073 automount[12326]: failed to mount /usr/local/share
May 4 16:20:49 node073 automount[12326]: umount_multi: no mounts
found under /usr/local/share
May 4 16:20:49 node073 automount[4793]: attempting to mount entry
/usr/local/man
May 4 16:21:22 node073 automount[12327]: aquire_lock: can't lock lock
file timed out: /var/lock/autofs
May 4 16:21:22 node073 automount[12327]: mount(nfs): nfs: mount
failure server:/vol/vol0/unix/apps/usrlocal/man/x86_64.linux on
/usr/local/man
May 4 16:21:22 node073 automount[12327]: failed to mount /usr/local/man
May 4 16:21:22 node073 automount[12327]: umount_multi: no mounts
found under /usr/local/man
May 4 16:21:22 node073 automount[4793]: attempting to mount entry
/usr/local/share
May 4 16:21:27 node073 automount[4793]: attempting to mount entry
/usr/local/sbin
May 4 16:21:27 node073 automount[4793]: attempting to mount entry
/usr/local/bin
May 4 16:21:27 node073 automount[4793]: attempting to mount entry
/usr/local/Cluster-Config
May 4 16:21:52 node073 automount[4879]: attempting to mount entry
/home/server/pg
May 4 16:21:52 node073 automount[12352]: failed to mount /home/server/pg
May 4 16:21:52 node073 automount[12352]: umount_multi: no mounts
found under /home/server/pg
May 4 11:15:45 node018 automount[4804]: attempting to mount entry
/usr/local/share
May 4 11:16:14 node018 automount[9807]: >> mount: mount point
/usr/local/share does not exist
May 4 11:16:14 node018 automount[9807]: mount(nfs): nfs: mount
failure server:/vol/vol0/unix/apps/usrlocal/share on /usr/local/share
May 4 11:16:14 node018 automount[9807]: failed to mount /usr/local/share
May 4 11:16:14 node018 automount[9807]: umount_multi: no mounts found
under /usr/local/share
We also see occasional nfs statfs errors:
May 4 13:01:14 node010 kernel: [47400.196162] nfs_statfs: statfs error = 5
May 4 13:01:16 node010 kernel: [47402.529365] nfs_statfs: statfs error = 5
May 4 13:01:48 node010 kernel: [47434.809709] nfs_statfs: statfs error = 5
_______________________________________________
autofs mailing list
[email protected]
http://linux.kernel.org/mailman/listinfo/autofs