Hi sorry for incomplete description. will trace problem more closely later next week and provide.
M On Mon, Jun 23, 2014 at 10:13 PM, Jeff Squyres (jsquyres) < jsquy...@cisco.com> wrote: > Ok, just got in to Chicago from my flight and am back online. > > Mike: you are still not providing very much information. :-\ > > Your first mails make it seem like MTT is continuing to run, but leaving > "launchers" (assumedly mpirun processes) still running, but they have no > children. Which would be very weird for mpirun to do, if it has no > children left. This could be both an MTT and an ORTE bug, in this case. > > But your last mail seems to imply that MTT is hanging indefinitely. > > Can you please provide a clear, precise description of what is happening? > > FWIW: Yes, we are killing the parent first now, to give mpirun a chance to > cleanup / tell remote orteds to die / kill children processes / etc. > Killing the children first both doesn't test the common case of how people > kill MPI processes (i.e., they kill mpirun), and it also doesn't allow > mpirun to tell remote processes to die. > > Do you run with --verbose output? MTT should output messages like "*** > Killing mpirun with SIGTERM", and the like. Do you see timeout messages at > all? I.e., is MTT not entering the timeout code at all? > > ...etc. > > > > On Jun 23, 2014, at 12:16 PM, Dave Goodell (dgoodell) <dgood...@cisco.com> > wrote: > > > On Jun 23, 2014, at 8:48 AM, Mike Dubman <mi...@dev.mellanox.co.il> > wrote: > > > >> btw, i think now, when parent process is killed before child, OS makes > child as "<defunct>" which stick around for good. > > > > The grandparent should inherit the child. If the grandparent then does > not wait(2) on the child, then the child will remain a zombie / defunct. > So in our specific case, this behavior will depend on what the parent > process of mpirun is and whether it is waiting on child processes > appropriately. > > > > -Dave > > > > _______________________________________________ > > mtt-devel mailing list > > mtt-de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel > > Link to this post: > http://www.open-mpi.org/community/lists/mtt-devel/2014/06/0633.php > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > _______________________________________________ > mtt-devel mailing list > mtt-de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel > Link to this post: > http://www.open-mpi.org/community/lists/mtt-devel/2014/06/0634.php >