On Feb 24, 2013, at 6:59 AM, Mike Dubman <mi...@dev.mellanox.co.il> wrote:

> What protection do you mean? Check that /proc/pid/status exists? It is done 
> in Grep()

Ah, excellent -- I hadn't noticed that.

> We observe that process which was launched by mtt and hangs (mtt detect 
> timeout and starts do_command procedure), later enters into "defunct" state.

Looking at the code, you're checking for zombie status before MTT kills the 
proc.  Am I reading that right?

If so, then it could well be that the process has exited but not yet been 
reaped (because _kill_proc() hasn't been invoked yet).  If this is the case, is 
the real cause of the problem that the OUTread and ERRread aren't being closed 
when the child process exits, and therefore we keep looping looking for new 
output from them?

Jeff Squyres
For corporate legal information go to: 

Reply via email to