On 2/13/22 1:31 AM, סאן עמר wrote:
> Hey, first of all thanks for your answers and sorry for how I phrase my 
> question.

Sorry, looks like I may be the one who needs to apologize. (See below.)

> Rob, I still need a clarification if you don't mind. anything that you mention
> is making sense to me if the timeout implementation here was that the actual
> command is a child process of a process that doing the timeout logic.

One way that wait() can become unreliable is if the child process inherited a
signal mask from its parent (or grandparent) process with SIGCHLD ignored. Let's
see what happens in that case, via this dumb little test program:

$ cat blah.c
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/wait.h>

int main(int argc, char *argv[])
{
  int pid, status = 0;

  signal(SIGCHLD, SIG_IGN);
  if (!(pid = fork())) {sleep(1); exit(1);}
  else {
    pid = wait(&status);
    printf("exited %d %d\n", pid, status);
  }

  exit(1);
}

So wait() blocks until the child exits, then returns -1 with status left at 0.
Let's see what busybox */timeout.c is doing:

        int status;
...
        wait(&status); /* wait for child to die */
        /* Did intermediate [v]fork or exec fail? */
        if (!WIFEXITED(status) || WEXITSTATUS(status) != 0)
                return EXIT_FAILURE;

Which means it's not checking wait's return value, and with a 0 status
_WIFEXITED() returns... um, some grep -r under /usr/include says:

#define __WEXITSTATUS(status)   (((status) & 0xff00) >> 8)
#define __WTERMSIG(status)      ((status) & 0x7f)
#define __WIFEXITED(status)     (__WTERMSIG(status) == 0)

So 0 reads as "it exited" and "exit status 0". So busybox timeout gets a false
positive for success. (Except that "status" was uninitalized so it's only
PROBABLY zero, not necessarily required to be. These days signal handlers have
their own stack so it's probably clean if the call to this function is the
first/deepest call made on the stack so far, which wouldn't necessarily be the
case for busybox calling shell builtins...

Anyway, if it doesn't EXIT_FAILURE it falls through to:

        /* Ok, exec a program as requested */
        argv += optind;
#if !BB_MMU
        argv[0] = sv1;
        argv[1] = sv2;
#endif
        BB_EXECVP_or_die(argv);

Wait, the PARENT process is execing the command line, but is doing so AFTER
waiting for the child? What?

The child is sending a kill signal to the parent process? Ok, that is EXACTLY
the "cannot reliably work" that the other guy was replying to you, that's 
BONKERS.

static NOINLINE int timeout_wait(int timeout, pid_t pid)
{
        /* Just sleep(HUGE_NUM); kill(parent) may kill wrong process! */
...
                if (kill(pid, 0)) {
...
}

And the caller is:
           /* Here we are grandchild. Sleep, then kill grandparent */
 grandchild:
           if (timeout_wait(timeout, parent) == EXIT_SUCCESS)

I just... no? It literally HAS A COMMENT saying that what it's doing won't work
reliably?

At this point, I think _I_ need some explanation about what's going on here.

> which is the case for other implementations I saw (for example here
> <http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;f=src/timeout.c;h=98c064a18d54e64e151d8340b8151f9b00240885;hb=HEAD>)
> but for the case of busybox implementation, the timeout is a child process of
> the command it runs on (actually a grandchild),

Yeah, sorry. That IS bonkers.

The "this API cannot be used sanely" complaint was wrong, but "busybox is not
using this API sanely" is ALSO wrong. To reliably monitor a process you either
need to be its parent, or set up a similar relationship like containers, or
ptrace it, or you can pull some tricks with a filehandle to its /proc/$PID
contents...

But not this.

> so the case of process that
> represents the command finish and its parent releasing its PID (via wait) can
> occur, while the process that represents the timeout missing it.
> and all this can happen while there is no case of ignoring the SIGCHLD.
> am I wrong?

My bad. I thought the busybox code here was sane. It does not appear to be.
You're right: reversing the parent/child relationship like this CANNOT be 
reliable.

I assumed busybox's implementation worked like the one I wrote at
https://github.com/landley/toybox/blob/master/toys/other/timeout.c where the if
(!pid) does the exec() and the else case handles waiting, and it just uses an
alarm timer to interrupt itself. But apparently not?

(Sorry I didn't check earlier, but I don't usually look at busybox code anymore
because I work on a project under a non-GPL license and don't want to get sued.)

Rob
_______________________________________________
busybox mailing list
[email protected]
http://lists.busybox.net/mailman/listinfo/busybox

Reply via email to