==> Regarding [autofs] opensuse 10/autofs-4.1.4-6 Automount getting stuck with superblock/time out/no mount found errors; "Neil Millar" <[EMAIL PROTECTED]> adds:
nmillar> We are having a problem with autofs getting stuck mounting a nmillar> number of directories on a number of machines in a short space of nmillar> time. nmillar> We have some compute nodes that use autofs to mount home nmillar> directories and system files. We are seeing a problem with autofs nmillar> hanging: nmillar> If you log into a node interactively everything mounts and works nmillar> correctly. However If you submit a job to, say 20 nodes at once it nmillar> connects via ssh into the nodes, which each mount the home nmillar> directory /server/staff and because of the setup also mounts nmillar> server/pg server/ug server/misc server/group server/package and a nmillar> few /usr/local directories, what we then see is that a few of the nmillar> nodes will get stuck trying to mount some of these directories. If nmillar> you login to the node interactively during this time a df command nmillar> will get stuck on one of the autofs controlled directories. You nmillar> can manually mount the stuck nfs mounts, and if you wait a few nmillar> minutes everything frees up and becomes available. Usually the nmillar> stuck mount becomes available, though occasionally you get nmillar> permission denied and have to restart automount. What do the logs on the NFS server show? If/when you do get debug logs from the automounter, that will certainly help debug the problem. -Jeff p.s. your auto.master refers to /etc/auto.home, but you only include the contents of the yp map auto.home. nmillar> I have attached the errors we see at the bottom, I haven't been nmillar> able to get debug output yet. The nodes I am running in debug nmillar> happen to be working at the moment :( nmillar> A very similar configuration has worked in the past with suse 9.1 nmillar> and autofs 4.0.0. The same map configuration works on other nmillar> machines with autofs-4.1.3-114 though they are used interactively nmillar> so don't need to mount so many things so quickly. nmillar> I saw this thread that may be relevant but wasn't 100% sure?: nmillar> http://hera.kernel.org/pipermail/autofs/2005-October/002521.html nmillar> Thanks for the help. nmillar> opensuse 10 x86_64 / autofs-4.1.4-6 / Kernel 2.6.15.1 nmillar> auto.master file ================ /usr/local multi file nmillar> /etc/auto.usr.local ---- yp auto.usr.local nmillar> rw,intr,noquota,noac,actimeo=0 nmillar> ypcat -k auto.master ==================== /data /etc/auto.data nmillar> -rw,intr,noquota,nosuid /home /etc/auto.home nmillar> -rw,intr,noquota,noac,actimeo=0 /usr/local /etc/auto.usr.local nmillar> -ro,intr,noquota nmillar> /etc/auto.usr.local file ======================== Apps nmillar> -rw,intr,noquota master:/usr/exportlocal/& Config -rw,intr,noquota nmillar> master:/usr/exportlocal/& Docs -rw,intr,noquota nmillar> master:/usr/exportlocal/& nmillar> ypcat -k auto.usr.local ======================= gsview nmillar> -rw,intr,noquota server:/vol/vol0/unix/apps/&/$ARCH molden nmillar> -rw,intr,noquota server:/vol/vol0/unix/apps/&/$ARCH etc... nmillar> ypcat -k auto.home ================== server nmillar> -rw,intr,quota,noac,actimeo=0 /staff &:/vol/vol0/staff /pg nmillar> &:/vol/vol0/pg /ug &:/vol/vol0/ug /misc &:/vol/vol0/misc /group nmillar> &:/vol/vol0/group /package &:/vol/vol0/package nmillar> server2 -rw,intr,noquota &:/local/home nmillar> server3 -rw,intr,quota,noac,actimeo=0 /staff &:/vol/vol0/staff /pg nmillar> &:/vol/vol0/pg /ug &:/vol/vol0/ug /misc &:/vol/vol0/misc /group nmillar> &:/vol/vol0/group /package &:/vol/vol0/package etc. nmillar> Configured Mount Points: ------------------------ nmillar> /usr/sbin/automount -v --timeout 3600 /data yp auto.data nmillar> rw,intr,noquota,nosuid -DARCH=x86_64.linux /usr/sbin/automount -v nmillar> --timeout 3600 /usr/local multi file /etc/auto.usr.local -- yp nmillar> auto.usr.local rw,intr,noquota,noac,actimeo=0 -DARCH=x86_64.linux nmillar> /usr/sbin/automount -v --timeout 3600 /home yp auto.home nmillar> rw,intr,noquota,noac,actimeo=0 -DARCH=x86_64.linux nmillar> Active Mount Points: -------------------- /usr/sbin/automount -v nmillar> --timeout 3600 /data yp auto.data rw,intr,noquota,nosuid nmillar> -DARCH=x86_64.linux /usr/sbin/automount -v --timeout 3600 nmillar> /usr/local multi file /etc/auto.usr.local -- yp auto.usr.local nmillar> rw,intr,noquota,noac,actimeo=0 /usr/sbin/automount -v --timeout nmillar> 3600 /home yp auto.home rw,intr,noquota,noac,actimeo=0 nmillar> -DARCH=x86_64.linux nmillar> root 4820 0.0 0.0 12060 872 ? Ss May03 0:00 /usr/sbin/automount nmillar> -v --timeout 3600 /data yp auto.data rw,intr,noquota,nosuid nmillar> -DARCH=x86_64.linux root 4822 0.0 0.0 12056 888 ? Ss May03 0:00 nmillar> /usr/sbin/automount -v --timeout 3600 /usr/local multi file nmillar> /etc/auto.usr.local -- yp auto.usr.local nmillar> rw,intr,noquota,noac,actimeo=0 -DARCH=x86_64.linux root 4901 0.0 nmillar> 0.0 9984 840 ? Ss May03 0:00 /usr/sbin/automount -v --timeout nmillar> 3600 /home yp auto.home rw,intr,noquota,noac,actimeo=0 nmillar> -DARCH=x86_64.linux nmillar> Some of the error messages we see are: nmillar> May 4 16:15:41 node073 automount[4879]: attempting to mount entry nmillar> /home/server May 4 16:17:27 node073 automount[4879]: attempting to nmillar> mount entry /home/server/staff May 4 16:17:27 node073 nmillar> automount[12292]: failed to mount /home/server/staff May 4 nmillar> 16:17:27 node073 automount[12292]: umount_multi: no mounts found nmillar> under /home/server/staff May 4 16:17:41 node073 automount[12288]: nmillar> >> mount: server:/vol/vol0/ug: can't read superblock May 4 nmillar> 16:17:41 node073 automount[12288]: mount(nfs): nfs: mount failure nmillar> server:/vol/vol0/ug on /home/server/ug May 4 16:18:37 node073 nmillar> automount[4793]: attempting to mount entry /usr/local/man May 4 nmillar> 16:19:10 node073 automount[12313]: aquire_lock: can't lock lock nmillar> file timed out: /var/lock/autofs May 4 16:19:10 node073 nmillar> automount[12313]: mount(nfs): nfs: mount failure nmillar> server:/vol/vol0/unix/apps/usrlocal/man/x86_64.linux on nmillar> /usr/local/man May 4 16:19:10 node073 automount[12313]: failed to nmillar> mount /usr/local/man May 4 16:19:10 node073 automount[12313]: nmillar> umount_multi: no mounts found under /usr/local/man May 4 16:19:10 nmillar> node073 automount[4793]: attempting to mount entry nmillar> /usr/local/share May 4 16:19:41 node073 automount[12288]: >> nmillar> mount: server:/vol/vol0/pg: can't read superblock May 4 16:19:41 nmillar> node073 automount[12288]: mount(nfs): nfs: mount failure nmillar> server:/vol/vol0/pg on /home/server/pg May 4 16:19:43 node073 nmillar> automount[12314]: aquire_lock: can't lock lock file timed out: nmillar> /var/lock/autofs May 4 16:19:43 node073 automount[12314]: nmillar> mount(nfs): nfs: mount failure nmillar> server:/vol/vol0/unix/apps/usrlocal/share on /usr/local/share May nmillar> 4 16:19:43 node073 automount[12314]: failed to mount nmillar> /usr/local/share May 4 16:19:43 node073 automount[12314]: nmillar> umount_multi: no mounts found under /usr/local/share May 4 nmillar> 16:19:43 node073 automount[4793]: attempting to mount entry nmillar> /usr/local/man May 4 16:20:01 node073 /usr/sbin/cron[12318]: nmillar> (root) CMD (/usr/local/Cluster-Apps/cluster-tools/sbin/ganglia) nmillar> May 4 16:20:16 node073 automount[12316]: aquire_lock: can't lock nmillar> lock file timed out: /var/lock/autofs May 4 16:20:16 node073 nmillar> automount[12316]: mount(nfs): nfs: mount failure nmillar> server:/vol/vol0/unix/apps/usrlocal/man/x86_64.linux on nmillar> /usr/local/man May 4 16:20:16 node073 automount[12316]: failed to nmillar> mount /usr/local/man May 4 16:20:16 node073 automount[12316]: nmillar> umount_multi: no mounts found under /usr/local/man May 4 16:20:16 nmillar> node073 automount[4793]: attempting to mount entry nmillar> /usr/local/share May 4 16:20:49 node073 automount[12326]: nmillar> aquire_lock: can't lock lock file timed out: /var/lock/autofs May nmillar> 4 16:20:49 node073 automount[12326]: mount(nfs): nfs: mount nmillar> failure server:/vol/vol0/unix/apps/usrlocal/share on nmillar> /usr/local/share May 4 16:20:49 node073 automount[12326]: failed nmillar> to mount /usr/local/share May 4 16:20:49 node073 automount[12326]: nmillar> umount_multi: no mounts found under /usr/local/share May 4 nmillar> 16:20:49 node073 automount[4793]: attempting to mount entry nmillar> /usr/local/man May 4 16:21:22 node073 automount[12327]: nmillar> aquire_lock: can't lock lock file timed out: /var/lock/autofs May nmillar> 4 16:21:22 node073 automount[12327]: mount(nfs): nfs: mount nmillar> failure server:/vol/vol0/unix/apps/usrlocal/man/x86_64.linux on nmillar> /usr/local/man May 4 16:21:22 node073 automount[12327]: failed to nmillar> mount /usr/local/man May 4 16:21:22 node073 automount[12327]: nmillar> umount_multi: no mounts found under /usr/local/man May 4 16:21:22 nmillar> node073 automount[4793]: attempting to mount entry nmillar> /usr/local/share May 4 16:21:27 node073 automount[4793]: nmillar> attempting to mount entry /usr/local/sbin May 4 16:21:27 node073 nmillar> automount[4793]: attempting to mount entry /usr/local/bin May 4 nmillar> 16:21:27 node073 automount[4793]: attempting to mount entry nmillar> /usr/local/Cluster-Config May 4 16:21:52 node073 automount[4879]: nmillar> attempting to mount entry /home/server/pg May 4 16:21:52 node073 nmillar> automount[12352]: failed to mount /home/server/pg May 4 16:21:52 nmillar> node073 automount[12352]: umount_multi: no mounts found under nmillar> /home/server/pg nmillar> May 4 11:15:45 node018 automount[4804]: attempting to mount entry nmillar> /usr/local/share May 4 11:16:14 node018 automount[9807]: >> mount: nmillar> mount point /usr/local/share does not exist May 4 11:16:14 node018 nmillar> automount[9807]: mount(nfs): nfs: mount failure nmillar> server:/vol/vol0/unix/apps/usrlocal/share on /usr/local/share May nmillar> 4 11:16:14 node018 automount[9807]: failed to mount nmillar> /usr/local/share May 4 11:16:14 node018 automount[9807]: nmillar> umount_multi: no mounts found under /usr/local/share nmillar> We also see occasional nfs statfs errors: nmillar> May 4 13:01:14 node010 kernel: [47400.196162] nfs_statfs: statfs nmillar> error = 5 May 4 13:01:16 node010 kernel: [47402.529365] nmillar> nfs_statfs: statfs error = 5 May 4 13:01:48 node010 kernel: nmillar> [47434.809709] nfs_statfs: statfs error = 5 nmillar> _______________________________________________ autofs mailing nmillar> list [email protected] nmillar> http://linux.kernel.org/mailman/listinfo/autofs _______________________________________________ autofs mailing list [email protected] http://linux.kernel.org/mailman/listinfo/autofs
