On 03/09/2013 12:01 AM, Serge Hallyn wrote: > Detection of SIGCHLD from the container init by the monitor process > which spawned it is done during lxc_poll. If the monitor is slow > and the init (especially if using lxc-init to run /bin/true) exits > quickly, it can send its SIGCHLD before lxc_poll starts. In that > case lxc_poll ends up hanging forever waiting for the SIGCHLD, > while the init process is a zombie waiting to be reaped.
This problem has already been solved a couple of years ago. I suspect there is another bug. The signalfd is set *before*, so if a signal arrives while starting the container, we should enter and exit immediately the mainloop. That could have an edge/level triggered problem but it is not the case. The problem is coming from the checking of the pid when it is received by the monitor. lxc-execute 1362829720.335 NOTICE lxc_execute - '/bin/true' started with pid '18591' lxc-execute 1362829721.335 INFO lxc_console - no rootfs, no console. lxc-execute 1362829721.335 WARN lxc_start - invalid pid for SIGCHLD: 18591 <> 18590 So it is ignoring the signal. This problem shouldn't appears with lxc-start. > > If you want to verify that race, you can simply add a sleep(3) right > after the lxc_set_state(... RUNNING) in lxc_spawn, then do > > sudo lxc-execute -n somecontainer -- true > > I suspect a clean way to handle this is using sigaction just between > lxc_spawn and lxc_poll to set a flag if the SIGCHLD from container > init is detected. However, I haven't yet coded that fix yet. If > someone wants to do that before I get to it that would be great. > In the meantime, another fix which works is below. On the one hand > it feels ugly because it's not using signals. On the other hand, > the sigaction method will be more complicated and less obvious... > > One thing I'm not sure of (and which will complicate either way) > is whether it is the case that any signals delivered between > the lxc_mainloop_add_handler() and the actual running of the > lxc_mainloop() will be queued up and handled in the lxc_mainloop(). > > This should fix at least 2 bugs, > https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1134923 > https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1144873 > > and maybe a third > https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1124526 > > Signed-off-by: Serge Hallyn <serge.hal...@ubuntu.com> > --- > src/lxc/start.c | 34 +++++++++++++++++++++++++++++++++- > 1 file changed, 33 insertions(+), 1 deletion(-) > > diff --git a/src/lxc/start.c b/src/lxc/start.c > index f0e82a3..75d1fd6 100644 > --- a/src/lxc/start.c > +++ b/src/lxc/start.c > @@ -220,6 +220,28 @@ static int setup_signal_fd(sigset_t *oldmask) > return fd; > } > > +static int is_zombie(int pid) > +{ > + char path[PATH_MAX]; > + FILE *f; > + int ret, junkint; > + char state; > + > + ret = snprintf(path, PATH_MAX, "/proc/%d/stat", pid); > + if (ret < 0 || ret >= PATH_MAX) > + return 0; > + f = fopen(path, "r"); > + if (!f) // if it doesn't exist, process doesn't exist... > + return 0; > + ret = fscanf(f, "%d %s %c", &junkint, path, &state); > + fclose(f); > + if (ret != 3) > + return 0; > + if (state == 'Z') > + return 1; > + return 0; > +} > + > static int signal_handler(int fd, void *data, > struct lxc_epoll_descr *descr) > { > @@ -358,6 +380,7 @@ int lxc_poll(const char *name, struct lxc_handler > *handler) > int sigfd = handler->sigfd; > int pid = handler->pid; > struct lxc_epoll_descr descr; > + int ret = -1; > > if (lxc_mainloop_open(&descr)) { > ERROR("failed to create mainloop"); > @@ -390,13 +413,22 @@ int lxc_poll(const char *name, struct lxc_handler > *handler) > #endif > } > > + ret = 0; > + /* more robustness, protect ourself from a SIGCHLD sent > + * by a process different from the container init > + */ > + if (is_zombie(handler->pid)) { > + INFO("init has already exited"); > + goto out_mainloop_open; > + } > + > return lxc_mainloop(&descr); > > out_mainloop_open: > lxc_mainloop_close(&descr); > out_sigfd: > close(sigfd); > - return -1; > + return ret; > } > > extern int lxc_caps_check(void); ------------------------------------------------------------------------------ Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the endpoint security space. For insight on selecting the right partner to tackle endpoint security challenges, access the full report. http://p.sf.net/sfu/symantec-dev2dev _______________________________________________ Lxc-devel mailing list Lxc-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-devel