On 03/09/2013 12:01 AM, Serge Hallyn wrote:
> Detection of SIGCHLD from the container init by the monitor process
> which spawned it is done during lxc_poll.  If the monitor is slow
> and the init (especially if using lxc-init to run /bin/true) exits
> quickly, it can send its SIGCHLD before lxc_poll starts.  In that
> case lxc_poll ends up hanging forever waiting for the SIGCHLD,
> while the init process is a zombie waiting to be reaped.

This problem has already been solved a couple of years ago. I suspect
there is another bug.

The signalfd is set *before*, so if a signal arrives while starting the
container, we should enter and exit immediately the mainloop.

That could have an edge/level triggered problem but it is not the case.

The problem is coming from the checking of the pid when it is received
by the monitor.

    lxc-execute 1362829720.335 NOTICE   lxc_execute - '/bin/true'
started with pid '18591'
    lxc-execute 1362829721.335 INFO     lxc_console - no rootfs, no console.
    lxc-execute 1362829721.335 WARN     lxc_start - invalid pid for
SIGCHLD: 18591 <> 18590

So it is ignoring the signal. This problem shouldn't appears with lxc-start.


>
> If you want to verify that race, you can simply add a sleep(3) right
> after the lxc_set_state(... RUNNING) in lxc_spawn, then do
>
>       sudo lxc-execute -n somecontainer -- true
>
> I suspect a clean way to handle this is using sigaction just between
> lxc_spawn and lxc_poll to set a flag if the SIGCHLD from container
> init is detected.  However, I haven't yet coded that fix yet.  If
> someone wants to do that before I get to it that would be great.
> In the meantime, another fix which works is below.  On the one hand
> it feels ugly because it's not using signals.  On the other hand,
> the sigaction method will be more complicated and less obvious...
>
> One thing I'm not sure of (and which will complicate either way)
> is whether it is the case that any signals delivered between
> the lxc_mainloop_add_handler() and the actual running of the
> lxc_mainloop() will be queued up and handled in the lxc_mainloop().
>
> This should fix at least 2 bugs,
> https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1134923
> https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1144873
>
> and maybe a third
> https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1124526
>
> Signed-off-by: Serge Hallyn <serge.hal...@ubuntu.com>
> ---
>  src/lxc/start.c | 34 +++++++++++++++++++++++++++++++++-
>  1 file changed, 33 insertions(+), 1 deletion(-)
>
> diff --git a/src/lxc/start.c b/src/lxc/start.c
> index f0e82a3..75d1fd6 100644
> --- a/src/lxc/start.c
> +++ b/src/lxc/start.c
> @@ -220,6 +220,28 @@ static int setup_signal_fd(sigset_t *oldmask)
>       return fd;
>  }
>  
> +static int is_zombie(int pid)
> +{
> +     char path[PATH_MAX];
> +     FILE *f;
> +     int ret, junkint;
> +     char state;
> +
> +     ret = snprintf(path, PATH_MAX, "/proc/%d/stat", pid);
> +     if (ret < 0 || ret >= PATH_MAX)
> +             return 0;
> +     f = fopen(path, "r");
> +     if (!f) // if it doesn't exist, process doesn't exist...
> +             return 0;
> +     ret = fscanf(f, "%d %s %c", &junkint, path, &state);
> +     fclose(f);
> +     if (ret != 3)
> +             return 0;
> +     if (state == 'Z')
> +             return 1;
> +     return 0;
> +}
> +
>  static int signal_handler(int fd, void *data,
>                          struct lxc_epoll_descr *descr)
>  {
> @@ -358,6 +380,7 @@ int lxc_poll(const char *name, struct lxc_handler 
> *handler)
>       int sigfd = handler->sigfd;
>       int pid = handler->pid;
>       struct lxc_epoll_descr descr;
> +     int ret = -1;
>  
>       if (lxc_mainloop_open(&descr)) {
>               ERROR("failed to create mainloop");
> @@ -390,13 +413,22 @@ int lxc_poll(const char *name, struct lxc_handler 
> *handler)
>               #endif
>       }
>  
> +     ret = 0;
> +     /* more robustness, protect ourself from a SIGCHLD sent
> +      * by a process different from the container init
> +      */
> +     if (is_zombie(handler->pid)) {
> +             INFO("init has already exited");
> +             goto out_mainloop_open;
> +     }
> +
>       return lxc_mainloop(&descr);
>  
>  out_mainloop_open:
>       lxc_mainloop_close(&descr);
>  out_sigfd:
>       close(sigfd);
> -     return -1;
> +     return ret;
>  }
>  
>  extern int lxc_caps_check(void);


------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel

Reply via email to