On Mon, Feb 25, 2013 at 6:24 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com > wrote:
> >Looking at the code, you're checking for zombie status before MTT kills > the proc. Am I reading that right? > I don`t think the order matters, if process is not Zombie yet and about to be killed by MTT later - it is a good flow. If process is already Zombie - mtt will not be able to kill it anyway and and can stop waiting and switch to the new task. > >If so, then it could well be that the process has exited but not yet been > reaped (because _kill_proc() hasn't been invoked yet). If this is the > case, is the real cause of the problem that >the OUTread and ERRread aren't > being closed when the child process exits, and therefore we keep looping > looking for new output from them? > yep, sounds like it can be the cause, need to look into this code.