Hi all.  I'm running Ubuntu 8.04 (kernel 2.6.24-17-generic, autofs 4.1.4
+debian-2.1ubuntu2) on an Intel Pentium D 3GHz system with
hyperthreading (SMP kernel) and 1G RAM.

I'm using DHCP for networking and obtaining my automount maps via NIS.

For the last week or so, almost every morning when I come into work my
system is hung up in a strange way.  I can move my mouse but I never get
asked for my password to unlock my screen.  I can C-A-F1 etc. to get
back to a console but after I type my username at the login prompt, I
never get asked for a password and then that console is locked up.  If I
have a console session already logged in from the day before, then I can
use it for a while but eventually some command will lock hard; can't ^C,
can't ^Z, can't kill -9, nothing.

If I try to C-A-D to reboot the system starts to come down but then
hangs, hard, trying to bring down automount.  Reset just tries to reboot
again and hangs in the same place.  I have to power off/on the system
completely.  Bummer.

I did some debugging on this problem.  I logged in as root on every
console (F1-F6).  The next morning when the system was hung, I found a
command that hung (just "ls") and then I ran it in another console under
strace.

It turns out what's happening is it's opening /proc/mounts, which
succeeds, then trying to read(2) from it.  The read system call never
returns and there's no way to kill that process, at all, once it's in
that state.  Also I note the load on the system is very high: typically
over 7.  However top shows no processes chewing CPU.  I also note that
there are some "duplicate" automount processes running (that is, more
than one for the same map).  After I reboot, of course, everything is
fine.

Last night I started all the consoles and in one of them I wrote a
little shell script that ran `date`, then did cat /proc/mounts, then
slept for 15 seconds, then did it again.  I sent the output to a file.

I found that the hang happened last night at ~22:51 EDT.  There was
nothing interesting in the messages log, but in syslog I find a lot of
messages right around that time trying to get to non-existent automount
files (this is caused by some bogosity in the Tracker utility in Gnome,
but it shouldn't cause the system to hang!):

Jun  2 22:51:29 psmithub automount[29241]: >> mount.nfs: access denied by 
server while mounting snap-dev01:/user/.Trash-10490
Jun  2 22:51:29 psmithub automount[29241]: mount(nfs): nfs: mount failure 
snap-dev01:/user/.Trash-10490 on /user/.Trash-10490
Jun  2 22:51:29 psmithub automount[29241]: failed to mount /user/.Trash-10490
Jun  2 22:51:29 psmithub automount[29342]: failed to mount /nfs/.Trash
Jun  2 22:51:29 psmithub automount[29343]: failed to mount /nfs/.Trash-10490
Jun  2 22:51:29 psmithub automount[29344]: failed to mount /mnt/.Trash
Jun  2 22:51:29 psmithub automount[29345]: failed to mount /mnt/.Trash-10490
Jun  2 22:51:29 psmithub automount[29346]: >> /sbin/showmount: can't get 
address for .Trash
Jun  2 22:51:29 psmithub automount[29346]: lookup(program): lookup for .Trash 
failed
Jun  2 22:51:29 psmithub automount[29346]: failed to mount /net/.Trash
Jun  2 22:51:29 psmithub automount[29353]: >> /sbin/showmount: can't get 
address for .Trash-10490
Jun  2 22:51:29 psmithub automount[29353]: lookup(program): lookup for 
.Trash-10490 failed
Jun  2 22:51:29 psmithub automount[29353]: failed to mount /net/.Trash-10490
Jun  2 22:51:34 psmithub automount[29212]: mount(nfs): nfs: mount failure 
snap-dev01:/tools on /opt/net/tools
Jun  2 22:51:34 psmithub automount[29212]: failed to mount /opt/net/tools

That's the last message of interest in the syslog.  Here's the end of
the shell script loop log:

Mon Jun  2 22:51:30 EDT 2008
rootfs / rootfs rw 0 0
none /sys sysfs rw,nosuid,nodev,noexec 0 0
none /proc proc rw,nosuid,nodev,noexec 0 0
udev /dev tmpfs rw,relatime 0 0
fusectl /sys/fs/fuse/connections fusectl rw,relatime 0 0
/dev/disk/by-uuid/c7ada654-6e09-4400-ae85-c93e7fcd99d7 / ext3 
rw,relatime,errors=remount-ro,data=ordered 0 0
/dev/disk/by-uuid/c7ada654-6e09-4400-ae85-c93e7fcd99d7 /dev/.static/dev ext3 
rw,relatime,errors=remount-ro,data=ordered 0 0
tmpfs /var/run tmpfs rw,nosuid,nodev,noexec 0 0
tmpfs /var/lock tmpfs rw,nosuid,nodev,noexec 0 0
tmpfs /lib/modules/2.6.24-17-generic/volatile tmpfs rw,relatime 0 0
tmpfs /dev/shm tmpfs rw,relatime 0 0
devpts /dev/pts devpts rw,relatime 0 0
tmpfs /var/run tmpfs rw,nosuid,nodev,noexec 0 0
tmpfs /var/lock tmpfs rw,nosuid,nodev,noexec 0 0
/dev/sda5 /home ext3 rw,relatime,data=ordered 0 0
securityfs /sys/kernel/security securityfs rw,relatime 0 0
rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0
automount(pid5466) /net autofs 
rw,relatime,fd=4,pgrp=5466,timeout=300,minproto=2,maxproto=4,indirect 0 0
automount(pid5367) /mnt autofs 
rw,relatime,fd=4,pgrp=5367,timeout=60,minproto=2,maxproto=4,indirect 0 0
automount(pid5404) /nfs autofs 
rw,relatime,fd=4,pgrp=5404,timeout=3600,minproto=2,maxproto=4,indirect 0 0
automount(pid5532) /user autofs 
rw,relatime,fd=4,pgrp=5532,timeout=300,minproto=2,maxproto=4,indirect 0 0
automount(pid5612) /export/autofs autofs 
rw,relatime,fd=4,pgrp=5612,timeout=60,minproto=2,maxproto=4,indirect 0 0
automount(pid5684) /opt/net autofs 
rw,relatime,fd=4,pgrp=5684,timeout=36000,minproto=2,maxproto=4,indirect 0 0
nfsd /proc/fs/nfsd nfsd rw,relatime 0 0


Mon Jun  2 22:51:45 EDT 2008

Then it just hangs.

If anyone has any thoughts about this, including ways I could proceed to
debug it, I'm interested!

_______________________________________________
autofs mailing list
[email protected]
http://linux.kernel.org/mailman/listinfo/autofs

Reply via email to