On 18-Feb-00 Steven N. Hirsch wrote:
> I'm running 2.2.14 w/ your autofs kernel patches and latest utilities.
OK, so that's 4.0.0pre6?
> Twice in the last week, I've had an automounted NFS export "disappear" and
> refuse to remount without restarting autofs.
Is the symptom that the /net/cy directory still exists, but there's nothing
mounted on it? Is there anything mounted underneath? If you do a force expire
(kill -USR1 automount), does it delete the mountpoint and allow a remount?
> Initial access to '/net/cy/*' mounts 'cy:/' and nests 'cy:/usr/src'
> beneath it. At some point, 'cy:/' times out and refuses to remount. Here
> is a log extract (I don't know what, if anything, identd might have to do
> with it):
identd should be irrelevent, unless you have some strange NFS server which uses
it...
> Feb 16 23:27:02 pii automount[6736]: expired /net/cy
> Feb 17 07:09:54 pii automount[536]: attempting to mount entry /net/cy
>
> -----> Sometime in here, 'cy:/' failed to remount!
>
> I lost all incoming messages from a run of 'fetchmail' (my incoming mail
> spool is nfs-mounted and they went into the bit-bucket).
(Fetchmail throws things away if something breaks? Bad fetchmail!)
> As root, I umounted 'cy:/usr/src' and tried to cd into /net/cy several
> times to trigger a remount of 'cy:/'. No luck. It would mount
> 'cy:/usr/src', i.e. the only thing under /net/cy would be 'usr/src/*'.
> The root volume was just not visible.
So did it mount /net/cy/usr/src, but not /net/cy itself? That's very strange.
If for some reason the first mount of /net/cy failed then it will go on to
mount /net/cy/usr/src, creating a mount-point if necessary. If it times that
out, then it should clean things up properly. Rather than unmounting things
manually, you should send automount a USR1 to get it to clean things up (or at
least send a USR1 after doing the umount).
> Feb 17 08:02:22 pii automount[9728]: mount(nfs): entry cy: \
> host cy.fast.net: lookup failure
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Looks like a DNS problem?
> Feb 17 19:34:41 pii in.identd[10963]: started
>
> I tried to shut it down and restart, but something was busy:
>
> Feb 17 19:42:23 pii automount[532]: shutting down, path = /misc
> Feb 17 19:42:23 pii automount[11490]: expired /net/cy
> Feb 17 19:42:23 pii automount[536]: shutdown failed: filesystem still busy
Was there something still using the /net directory (as cwd perhaps)?
> Feb 17 19:42:56 pii automount[536]: attempting to mount entry /net/cy
>
> ------> Here it came back!
It will have cleaned everything up while doing the shutdown. When the shutdown
failed because /net was busy, it would have returned to normal operating state.
> Only after forcibly trying to shut down, did the root volume reappear.
> This has happened about 3x over the past week.
It looks to me like the first mount failed, but the second succeeded; I've
seen similar things here when DNS is in a slightly strange state, where it
fails one query and succeeds for the same one immediately after.
To get more info about what its trying to do in detail, you can compile
automount with DEBUG set (see near the top of daemon/automount.c), or look at
the NFS server logs to see what mount attempts it actually saw.
I guess a fix I can apply is to never allow partial mounts; if one mount fails,
back out any that succeeded and fail the entire mount.
J