On 27/06/11 23:55, Alan Curry wrote:
> =?UTF-8?Q?P=C3=A1draig?= Brady writes:
>>
>> This is a multi-part message in MIME format.
>> --------------000003030307000505070101
>> Content-Type: text/plain; charset=ISO-8859-1
>> Content-Transfer-Encoding: 8bit
>>
>> On 27/06/11 21:12, Alan Curry wrote:
>>>
>>> What seems to be happening is that make *doesn't* create a process group,
>>> therefore assumes that when it gets a SIGINT, its children have already
>>> gotten it too, and it just waits for them to die. A child that puts itself
>>> into a new process group screws this up (as would kill -2 `pidof make`).
>>
>> Thanks for the analysis Alan.
>> Yes you're right I think.
>> In any case the important point is that timeout sets itself as group leader,
>> and is not the foreground group.
> 
> Right, we have a tree of process groups that goes roughly
> shell->make->timeout and the one in the middle of the tree is the
> foreground, receiving tty-based signals.
> 
>>
>>>
>>> I think the answer is that timeout should put itself into the foreground.
>>> That way it would get the SIGINT. make wouldn't get it, but wouldn't need 
>>> to.
>>> timeout would exit quickly after SIGINT and make would proceed or abort
>>> according to the exit code.
>>
>> I've a version locally here actually that calls tcsetpgrp() but I discounted
>> that as it's not timeout's place to call that I think.
>> timeout sets itself as group leader so that it can kill everything it starts,
>> but it shouldn't need to grab the foreground group as the shell (or make)
>> may be starting it in the background etc.
> 
> It seems like this is a misuse of process groups, using them as if they
> were a handle for killing a whole tree of processes. That's not what
> they're for. Process groups were invented to support job control, which
> means the only program that was supposed to mess with them was csh. Only
> the lack of a "kill process tree" primitive (and the fact that you can't
> even query the process tree easily) tempts us into using process groups
> as a shortcut.
> 
> Any non-job-control-aware parent process will have a problem with
> timeout's behavior. We've already seen what GNU make does. pmake simply
> dies of the SIGINT and leaves the child processes lingering (it probably
> also assumes they got the SIGINT, and doesn't bother waiting for them).
> 
> In an interactive shell with job control disabled (set +m in most
> Bourne-ish shells), the behavior is not good there either. dash, bash,
> and posh all act like GNU make, appearing to ignore the SIGINT. zsh acts
> more like pmake, printing a new prompt but leaving the timeout and its
> child running.
> 
> timeout's pgrp behavior only appears harmless when the parent process is
> a shell with job control, which expects its children to be in separate
> process groups. But in that case, timeout doesn't need to put itself in
> a new process group because the shell has already done so.
> 
> So I suggest that if you create a process group, you take on the
> responsibility of behaving like a job control shell in other ways,
> including managing the foreground group. (An important piece of that is
> remembering the original value and restoring it before you exit).

I'm still not convinced we need to be messing with tcsetpgrp()
but you're right in that the disconnect between the timeout
process group and that of whatever starts `timeout` should be bridged.

I'm testing the attached patch at the moment (which I'll split into 2).
It only creates a separate group for the child that `timeout` execs,
leaving the timeout process in the original group to propagate signals down.

I'll need to do lots of testing with this before I commit.

cheers,
Pádraig.
diff --git a/src/timeout.c b/src/timeout.c
index a686225..3d5e8db 100644
--- a/src/timeout.c
+++ b/src/timeout.c
@@ -26,9 +26,6 @@
      EXIT_ENOENT        127      couldn't find job to exec
 
    Caveats:
-     If user specifies the KILL (9) signal is to be sent on timeout,
-     the monitor is killed and so exits with 128+9 rather than 124.
-
      If you start a command in the background, which reads from the tty
      and so is immediately sent SIGTTIN to stop, then the timeout
      process will ignore this so it can timeout the command as expected.
@@ -108,9 +105,9 @@ cleanup (int sig)
           alarm (kill_after);
           kill_after = 0; /* Don't let later signals reset kill alarm.  */
         }
-      send_sig (0, sig);
+      send_sig (-monitored_pid, sig);
       if (sig != SIGKILL && sig != SIGCONT)
-        send_sig (0, SIGCONT);
+        send_sig (-monitored_pid, SIGCONT);
     }
   else /* we're the child or the child is not exec'd yet.  */
     _exit (128 + sig);
@@ -283,12 +280,6 @@ main (int argc, char **argv)
 
   argv += optind;
 
-  /* Ensure we're in our own group so all subprocesses can be killed.
-     Note we don't just put the child in a separate group as
-     then we would need to worry about foreground and background groups
-     and propagating signals between them.  */
-  setpgid (0, 0);
-
   /* Setup handlers before fork() so that we
      handle any signals caused by child, without races.  */
   install_signal_handlers (term_signal);
@@ -310,6 +301,9 @@ main (int argc, char **argv)
       signal (SIGTTIN, SIG_DFL);
       signal (SIGTTOU, SIG_DFL);
 
+      /* Ensure we're in our own group so all subprocesses can be killed.  */
+      setpgid (0, 0);
+
       execvp (argv[0], argv);   /* FIXME: should we use "sh -c" ... here?  */
 
       /* exit like sh, env, nohup, ...  */
@@ -341,7 +335,16 @@ main (int argc, char **argv)
           if (WIFEXITED (status))
             status = WEXITSTATUS (status);
           else if (WIFSIGNALED (status))
-            status = WTERMSIG (status) + 128; /* what sh does at least.  */
+            {
+              int sig = WTERMSIG (status);
+              if (!timed_out)
+                {
+                  /* FIXME: use sigaction to ensure exit without cores etc. */
+                  signal (sig, SIG_DFL);
+                  raise (sig); /* Exit with signal flag set.  */
+                }
+              status = sig + 128; /* Exit with this if didn't exit above.  */
+            }
           else
             {
               /* shouldn't happen.  */
diff --git a/tests/misc/timeout b/tests/misc/timeout
index 7506e7c..c7867c6 100755
--- a/tests/misc/timeout
+++ b/tests/misc/timeout
@@ -39,8 +39,13 @@ test $? = 124 || fail=1
 # kill delay. Note once the initial timeout triggers,
 # the exit status will be 124 even if the command
 # exits on its own accord.
-timeout -s0 -k1 1 sleep 10
-test $? = 124 && fail=1
+cat <<\EOF > sleep_msg
+sleep $1
+echo completed
+EOF
+chmod a+x sleep_msg
+timeout -s0 -k1 1 ./sleep_msg 10 > out
+test -s out && fail=1
 
 # Ensure `timeout` is immune to parent's SIGCHLD handler
 # Use a subshell and an exec to work around a bug in FreeBSD 5.0 /bin/sh.

Reply via email to