Re: [autofs] Re: [patch, rfc] fix hung automounter issues

Jeff Moyer Mon, 18 Apr 2005 16:59:55 -0700

==> Regarding Re: [autofs] Re: [patch, rfc] fix hung automounter issues; [EMAIL 
PROTECTED] adds:


raven> On Mon, 18 Apr 2005 [EMAIL PROTECTED] wrote:

raven> Hang on, I think I've got a problem with white space and my mailer.

raven> If you have problems with the patch in the previous mail try this one 
raven> instead (maybe just use this one).

Okay, I've reviewed all parts except for the actual logging functions.  I'm
actually not convinced this is the right approach.  The overhead of setting
up and tearing down the signal mask is quite expensive.  I fear that
environments with a great number of mounts will fall over when debugging is
turned on, and that is the very thing we are trying to save.  Perhaps it is
worth another look at a separate logging process.

Having said that, the rest of your patch is still relevant.  I've spent a
good deal of time looking it over, and here are my comments.

@@ -58,14 +65,12 @@ int kproto_sub_version = 0; /* Kernel pr

  static int submount = 0;

-int do_verbose = 0;            /* Verbose feedback option */
-int do_debug = 0;              /* Enable full debug output */
-
  sigset_t ready_sigs;          /* signals only accepted in ST_READY */
  sigset_t lock_sigs;           /* signals blocked for locking */
  sigset_t sigchld_mask;

  struct autofs_point ap;
+extern volatile int in_interrupt;

Doesn't need to be volatile, so far as I can tell.  Also note that we need
to be careful to reset in_interrupt if we ever fork() from a signal
handler.


@@ -565,7 +572,7 @@ static enum states handle_child(int hang

                /* Check to see if expire process finished */
                if (pid == ap.exp_process) {
-                       int success, ret;
+                       int success;

                        if (!WIFEXITED(status))
                                continue;
@@ -594,15 +601,8 @@ static enum states handle_child(int hang

                        case ST_SHUTDOWN_PENDING:
                                next = ST_SHUTDOWN;
-                               if (success) {
-                                       ret = ioctl(ap.ioctlfd,
-                                               AUTOFS_IOC_ASKUMOUNT, &status);
-                                       if (!ret) {
-                                               if (status)
-                                                       break;
-                                       } else
-                                               break;
-                               }
+                               if (success)
+                                       break;

                                /* Failed shutdown returns to ready */
                                warn("can't shutdown: filesystem %s still busy",

This would be okay, except that the AUTOFS_IOC_ASKUMOUNT won't always
be run at shutdown.  See below.


@@ -1495,13 +1498,11 @@ static void sig_supervisor(int sig)

        case SIGTERM:
        case SIGUSR2:
-               /* Tell everyone to finish up */
-               signal_children(sig);
+               ap.state = ST_SHUTDOWN_PENDING;
                break;

        case SIGUSR1:
-               /* Pass on the prune event and ignore self signal */
-               signal_children(sig);
+               ap.state = ST_PRUNE;
                break;

        case SIGCHLD:
@@ -1509,20 +1510,18 @@ static void sig_supervisor(int sig)
                break;

        case SIGHUP:
-               ap.lookup->lookup_ghost(ap.path, ap.ghost, 0, 
ap.lookup->context);
-
                /* Pass on the reread event and ignore self signal */
-               kill(0, SIGHUP);
-               discard_pending(SIGHUP);
-
+               ap.state = ST_READMAP;
                break;
        }
        errno = save_errno;
+       in_interrupt--;
  }

  int supervisor(char *path)
  {
        unsigned int map = 0;
+       int ret;

        ap.path = alloca(strlen(path) + 1);
        strcpy(ap.path, path);
@@ -1538,6 +1537,37 @@ int supervisor(char *path)

        setup_signals(sig_supervisor, sig_supervisor);

I think we should set state to ST_READY here.  That would make things a bit
more clear.

+       while (ap.state != ST_SHUTDOWN) {
+               switch (ap.state) {
+               case ST_READMAP:
+                       st_readmap();

No check of return value?

+                       signal_children(SIGHUP);
+                       ap.state = ST_READY;
+                       break;
+               case ST_SHUTDOWN_PENDING:
+                       ret = signal_children(SIGUSR2);
+                       if (!ret) {
+                               ap.state = ST_SHUTDOWN;
+                               break;
+                       }
+
+                       /* Failed shutdown returns to ready */
+                       warn("can't shutdown: filesystem %s still busy",
+                                    ap.path);
+                       ap.state = ST_READY;
+                       break;
+               case ST_PRUNE:
+                       /* Pass on the prune event and ignore self signal */
+                       signal_children(SIGUSR1);
+                       ap.state = ST_READY;
+                       break;
+               default:
+                       ap.state = ST_READY;
+                       break;
+               }
+               sleep(1);
+       }
+
        while (waitpid(0, NULL, 0) > 0);

        return 0;

Hmm, this is simply not safe, we can lose signals.  You really need to
either block signals and implement a queue, or use a pipe like we do with
indirect maps.

@@ -1644,8 +1674,23 @@ int handle_mounts(char *path)
                kill(my_pid, SIGSTOP);

        while (ap.state != ST_SHUTDOWN) {
-               if (handle_packet() && errno != EINTR)
-                       break;
+               if (handle_packet() && errno != EINTR) {

The checking of errno here is completely bogus.  You can simply get rid of
it, I think.  (this is what I was referring to above, about the ioctl not
always being run).

+                       int ret, status = 0;
+
+                       ret = ioctl(ap.ioctlfd, AUTOFS_IOC_ASKUMOUNT, &status);
+                       /* 
+                        * If the ioctl fails assume the kernel doesn't have
+                        * AUTOFS_IOC_ASKUMOUNT and just continue.
+                        */
+                       if (!ret && status)
+                               break;

Consider handling EINTR for the ioctl call.

+
+                       /* Failed shutdown returns to ready */
+                       warn("can't shutdown: filesystem %s still busy",
+                                    ap.path);
+                       ap.state = ST_READY;

I think we should be blocking signals when modifying ap.state.

+                       alarm(ap.exp_runfreq);
+               }
        }

        /* Mop up remaining kids */


Okay, random tangent:

static void sig_statemachine(int sig)
{
        int save_errno = errno;
        enum states next = ap.state;

        in_interrupt++;
        switch (sig) {
        default:                /* all the "can't happen" signals */
                error("process %d got unexpected signal %d!", getpid(), sig);
                break;
                /* don't FALLTHROUGH */
        ...
        }

        debug("sig %d switching from %d to %d", sig, ap.state, next);

        errno = save_errno;
        in_interrupt--;
}

So, for sigsegv, sigbus, sigfpe, etc, we are simply trying to continue?
That doesn't seem right to me, and the comment in setup_signals agrees:

#ifndef DEBUG
        /* When debugging, these signals should be in the default state; when
           in production, we want to at least attempt to catch them and shut 
down. */

Also note this from the sigaction man page:
       According to POSIX, the behaviour of a process is  undefined  after  it
       ignores  a  SIGFPE, SIGILL, or SIGSEGV signal that was not generated by
       the kill() or the raise() functions.   Integer  division  by  zero  has
       undefined result.  On some architectures it will generate a SIGFPE sig-
       nal.  (Also dividing the most  negative  integer  by  -1  may  generate
       SIGFPE.)  Ignoring this signal might lead to an endless loop.

Anyway, something to fix for later, I guess.


diff -Nurp autofs-4.1.4.orig/include/automount.h 
autofs-4.1.4/include/automount.h
--- autofs-4.1.4.orig/include/automount.h       2005-01-26 21:03:02.000000000 
+0800
+++ autofs-4.1.4/include/automount.h    2005-04-17 21:07:27.000000000 +0800
@@ -109,7 +109,7 @@ struct autofs_point {
        volatile pid_t exp_process;             /* Process that is currently 
expiring */
        volatile struct pending_mount *mounts;  /* Pending mount queue */
        struct lookup_mod *lookup;              /* Lookup module */
-       enum states state;
+       volatile enum states state;

Again, why the volatile?  Are you worried that the compiler is going to
optimize out some code?  We certainly don't have issues with multiple
threads accessing the data...

-Jeff

_______________________________________________
autofs mailing list
autofs@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/autofs

Re: [autofs] Re: [patch, rfc] fix hung automounter issues

Reply via email to