Package: am-utils
Version: 6.2+rc20110530-3~mi
Severity: important
Tags: patch


Attached is a patch to fix a problem causing the am-utils init script
to get stuck when stopping, which typically means that the machine in
question fails to shutdown or reboot.

The root cause of this problem seems to be improper ordering of the
actions required to bring down the automounter in a clean way.
In particular, what does /not/ work is this (from the original init
script):
> amq | awk '{print $1}' | tac | xargs -r -n 1 amq -u
> kill -s TERM "$pid"
> [ # wait 120 seconds for something to become umounted (pointlessly) ]
Here, we request a graceful unmount of all filesystems in question,
however, an instant later we send SIGTERM.  Evidently amd does not have
time to fulfill those umount requests -- once having received SIGTERM,
it ceases to act on amq's unmount requests (the init script waits
120 seconds, anyway).  Worse, after waiting that out, we finally reach:
> # Attempt to forcibly unmount the remaining filesystems
> [...]
> umount -l -f "$fs"
, at which point we (and hence, the shutdown) hang forever, (at least
for NFS volumes from what I can tell)

The attached patch fixed that issue by reordering the actions as follows:
1. Request (via amq -u) the umounting of all automounted filesystems
2. Wait (max N seconds) until everything has been umounted
3. /Then/ send SIGTERM
4. Wait (max M seconds) for the amd process to disappear
5. If it didn't disappear, we try our last-resort measures
    i.e. umount -f  followed by kill -9, like the original init script does

The attached patch reliably fixed those issues for our 150-ish debian
office computers, where it's in production as of roughly 2 months ago.


PS: Besides the main issue, two minor bugs were addressed in one go:
- hide the ``kill: no such process'' message on the console we get
  when probing for amd's existence with kill -s 0

- remove ``returncode'' variable which didn't work anyway because it
  gets assigned to in a subshell


On a last note to avoid confusion: We are using the wheezy-version
of am-utils (6.2) on Debian squeeze, at the moment.  This is due to an unrelated
issue with am-utils 6.1.5.  The init scripts of both versions are near
identical, anyway.  (The local version suffix '~mi' in the 'Version'-header
is due to us having applied the proposed fix to the init script already.)


Best Regards,

Timo Buhrmester,
Bonn University, Germany

-- System Information:
Debian Release: 6.0.8
  APT prefers oldstable
  APT policy: (500, 'oldstable')
Architecture: i386 (i686)

Kernel: Linux 3.2.51.wap (SMP w/4 CPU cores)
Locale: LANG=en_US.iso885915, LC_CTYPE=en_US.iso885915 (charmap=ISO-8859-15)
Shell: /bin/sh linked to /bin/dash

Versions of packages am-utils depends on:
ii  debconf              1.5.36.1            Debian configuration management sy
ii  debianutils          3.4                 Miscellaneous utilities specific t
ii  libamu4              6.2+rc20110530-3~mi Support library for amd the 4.4BSD
ii  libc6                2.11.3-4            Embedded GNU C Library: Shared lib
ii  libgdbm3             1.8.3-9             GNU dbm database routines (runtime
ii  libhesiod0           3.0.2-20            Project Athena's DNS-based directo
ii  libldap-2.4-2        2.4.23-7.3          OpenLDAP libraries
ii  libwrap0             7.6.q-19            Wietse Venema's TCP wrappers libra
ii  portmap              6.0.0-2             RPC port mapper
ii  ucf                  3.0025+nmu1         Update Configuration File: preserv

am-utils recommends no packages.

Versions of packages am-utils suggests:
pn  am-utils-doc                  <none>     (no description available)
pn  nis                           <none>     (no description available)

-- Configuration Files:
/etc/am-utils/amd.conf changed:
[global]
  auto_dir = /amd
  log_file = syslog
  log_options = all,noinfo,nostats,nomap
  restart_mounts = yes
  selectors_in_defaults = yes
  unmount_on_exit = yes
  vendor = Debian
  map_type = ldap
  ldap_base = "ou=amd maps,dc=math,dc=uni-bonn,dc=de"
  ldap_hostports = ldap.math.uni-bonn.de:389
  ldap_proto_version = 3
 
[/home]
  map_name = home
[/import/www]
  map_name = www
[/import/home.ag]
  map_name = home.ag
[/import/home.sek]
  map_name = home.sek


-- debconf information:
  am-utils/import-amd-failed:
  am-utils/nis-master-map: amd.master
  am-utils/clustername:
  am-utils/nis-key: default
  am-utils/rpc-localhost:
  am-utils/map-others:
  am-utils/nis-master-map-key-style: onekey
* am-utils/map-net: false
  am-utils/use-nis: false
  am-utils/nis-custom: echo "/amd-is-misconfigured /usr/share/am-utils/amd.net"
  am-utils/import-amd-conf-done: false
  am-utils/import-amd-conf: false
  am-utils/map-home: false
  am-utils/log-to-file:
--- debian/am-utils.init        2013-11-04 10:18:51.000000000 +0100
+++ /etc/init.d/am-utils        2013-09-13 15:18:13.000000000 +0200
@@ -121,33 +121,24 @@
   # This function tries to kill an amd which hasn't exited nicely.
   # This happens really easily, especially if using mount_type autofs
 
-  # Get the currently mounted filesystems
-  filesystems=`/bin/tempfile -s .amd`
-  if [ $? -ne 0 ]; then
-    return 1
-  fi
-
-  amq | awk '$2 !~ "root" {print $1}' > $filesystems
+  # Attempt to forcibly unmount the remaining filesystems
+  amq | awk '$2 !~ "root" {print $1}' | tac | while read fs; do
+    if [ -d "$fs" ]; then
+        sleep 1
+        umount -l -f "$fs"
+    fi
+  done
 
   # Kill the daemon
   kill -9 "$pid"
 
   sleep 10
 
-  if kill -s 0 $pid; then
-    rm -f $filesystems
+  if kill -s 0 $pid 2>/dev/null; then
     return 1
   fi
 
-  # Attempt to forcibly unmount the remaining filesystems
-  returncode=0
-  tac $filesystems | while read fs; do
-    umount -l -f "$fs" || returncode=1
-    sleep 1
-  done
-
-  rm -f $filesystems
-  return $returncode
+  return 0
 }
 
 stop_amd() {
@@ -161,9 +152,33 @@
            return
        fi
     fi
-    echo -n "Requesting amd unmount filesystems: "
-    amq | awk '{print $1}' | tac | xargs -r -n 1 amq -u
-    echo " done."
+    echo -n "Requesting and waiting for amd unmount filesystems "
+
+    i=0
+    maxsecs=20
+
+    while [ $i -le $maxsecs ]; do
+        echo -n "."
+
+        empty=true
+        for f in $(amq | tac | awk '$2 != "root" && $2 != "toplvl" {print 
$1}'); do
+            empty=false
+            amq -u $f
+        done
+
+        if $empty; then break; fi
+
+        sleep 1
+        i="`expr $i + 1`"
+        [ $i -eq 15 ] && echo -n " [will wait `expr $maxsecs - $i` secs more] "
+    done
+
+    if $empty; then
+        echo " done."
+    else
+        echo " failed."
+    fi
+
     echo -n "Stopping automounter: amd "
     kill -s TERM "$pid"
     # wait until amd has finished; this may take a little bit, and amd can't

Reply via email to