On Wed, Feb 15, 2012 at 09:22:10AM -0800, Dmitry Mikulin wrote: > > > On 02/15/2012 08:32 AM, Konstantin Belousov wrote: > >On Mon, Feb 13, 2012 at 02:50:45PM -0800, Dmitry Mikulin wrote: > >>>>>It seems that now wait4(2) can be called from the real (non-debugger) > >>>>>parent first and result in the call to proc_reap(), isn't it ? We would > >>>>>then just reparent the child back to the caller, still leaving the > >>>>>zombie and confusing debugger. > >>>>When either gdb or the real parent gets to proc_reap() the process > >>>>wouldn't > >>>>get destroyed, it'll get caught by the following clause: > >>>> if (p->p_oppid&& (t = pfind(p->p_oppid)) != NULL) { > >>>> > >>>>and the real parent with get the child back into the children's list > >>>>while > >>>>gdb will get it into the orphan list. The second time around when > >>>>proc_reap() is entered, p->p_oppid will be 0 and the process will get > >>>>really reaped. Does it make sense? And proc_reparent() attempts to keep > >>>>the > >>>>orphan list clean and not have the same entries and the list of > >>>>siblings. > >>>Right, this is what I figured. But I asked about some further implication > >>>of this change: > >>> > >>>if real parent spuriosly calls wait4(2) on the child pid after the child > >>>exited, but before the debugger called the wait4(), then exactly the > >>>code you noted above will be run. This results in the child being fully > >>>returned to the original parent. > >>> > >>>Next, the wait4() call from debugger gets an error, and zombie will be > >>>kept around until parent calls wait4() for this pid once more. > >>> > >>>Am I missed something ? > >>In this case the process will move from gdb's child list to gdb's orphan > >>list when the real parent does a wait4(). Next time around the wait loop > >>in > >>gdb it'll be caught by the orphan's proc_reap(). > >I do not see how the next debugger loop could find this process at all, > >since the first wait4() call reparented it to the original parent. > > Not the debugger loop, the kern_wait() loop. The child get re-parented to > the original parent but moves to the orphan list of the debugger process.
Either the debugger loop which calls wait4/waitpid, or the kern_wait loop resulting from the debugger calling wait*. Could you, please, describe, how the patched kernel moves the wait'ed zombie to the orphan list of the debugger ? For me, it seems that there is another bug, the child appears both on the childdren list, and on the orphan list of the real parent.
pgpeVg6zOwT6c.pgp
Description: PGP signature