Re: [MTT devel] fix zombie commit

Jeff Squyres (jsquyres) Tue, 26 Feb 2013 05:32:23 -0500

On Feb 26, 2013, at 2:11 AM, Mike Dubman <mi...@dev.mellanox.co.il> wrote:


> On Mon, Feb 25, 2013 at 6:24 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
> wrote:
> >Looking at the code, you're checking for zombie status before MTT kills the 
> >proc.  Am I reading that right?
> I don`t think the order matters, if process is not Zombie yet and about to be 
> killed by MTT later - it is a good flow.
> If process is already Zombie - mtt will not be able to kill it anyway and and 
> can stop waiting and switch to the new task.

No, the _kill_proc() routine does both a kill() and a waitpid().  The waitpid() 
should reap the zombie.

I.e., if the process has died, MTT simply just hasn't reaped it yet.  Hence, 
it's a zombie.

> >If so, then it could well be that the process has exited but not yet been 
> >reaped (because _kill_proc() hasn't been invoked yet).  If this is the case, 
> >is the real cause of the problem that >the OUTread and ERRread aren't being 
> >closed when the child process exits, and therefore we keep looping looking 
> >for new output from them?
> yep, sounds like it can be the cause, need to look into this code.

Ok.  It would be interesting to see if the process dies, but:

1) MTT is still blocking in select() (i.e., OUTread and OUTerr aren't returning 
0 from sysread upon process death)

2) $done is somehow not getting set to 0, and therefore MTT is still looping 
until the timeout expires

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [MTT devel] fix zombie commit

Reply via email to