On Thu, Mar 09, 2017 at 04:46:46PM +0200, Konstantin Belousov wrote:
> Yes, there is a race, apparently, with the child zombie still not finishing
> sending the SIGCHLD to the parent and parent exiting.  The following should
> fix the issue, but I do not think that reproducing the problem is easy.

> diff --git a/sys/kern/kern_exit.c b/sys/kern/kern_exit.c
> index c524fe5df37..ba5ff84e9de 100644
> --- a/sys/kern/kern_exit.c
> +++ b/sys/kern/kern_exit.c
> @@ -189,6 +189,7 @@ exit1(struct thread *td, int rval, int signo)
>  {
>       struct proc *p, *nq, *q, *t;
>       struct thread *tdt;
> +     ksiginfo_t ksi;
>       mtx_assert(&Giant, MA_NOTOWNED);
>       KASSERT(rval == 0 || signo == 0, ("exit1 rv %d sig %d", rval, signo));
> @@ -456,7 +457,12 @@ exit1(struct thread *td, int rval, int signo)
>                       proc_reparent(q, q->p_reaper);
>                       if (q->p_state == PRS_ZOMBIE) {
>                               PROC_LOCK(q->p_reaper);
> -                             pksignal(q->p_reaper, SIGCHLD, q->p_ksi);
> +                             if (q->p_ksi != NULL) {
> +                                     ksiginfo_init(&ksi);
> +                                     ksiginfo_copy(q->p_ksi, &ksi);
> +                             }
> +                             pksignal(q->p_reaper, SIGCHLD, q->p_ksi !=
> +                                 NULL ? &ksi : NULL);
>                               PROC_UNLOCK(q->p_reaper);
>                       }
>               } else {

This patch introduces a subtle correctness bug. A real SIGCHLD ksiginfo
should always be the zombie's p_ksi; otherwise, the siginfo may be lost
if there are too many signals pending for the target process or in the
system. If the siginfo is lost and the reaper normally passes si_pid to
waitpid() or similar (instead of passing WAIT_ANY or P_ALL), a zombie
will remain until the reaper terminates.

Conceptually the siginfo is sent to one process at a time only, so the
bug is an artifact of the implementation. Perhaps the piece of code
added in r309886 can be moved or the ksiginfo can be removed from the
parent's queue.

If such a fix is not possible, it may be better to send a bare SIGCHLD
(si_code is SI_KERNEL or 0, depending on how many signals are pending)
in this situation and document that reapers must use WAIT_ANY or P_ALL.
(However, compared to the pre-r309886 situation they can still use
SIGCHLD to get notified when to call waitpid() or similar.)

Jilles Tjoelker
freebsd-current@freebsd.org mailing list
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to