Ok, thanks. In the meantime, please roll back to the v3.0.0 tag and you should be good. Sorry for the hassle. :-(
On Jun 25, 2014, at 12:19 AM, Mike Dubman <[email protected]> wrote: > Hi > sorry for incomplete description. will trace problem more closely later next > week and provide. > > M > > > On Mon, Jun 23, 2014 at 10:13 PM, Jeff Squyres (jsquyres) > <[email protected]> wrote: > Ok, just got in to Chicago from my flight and am back online. > > Mike: you are still not providing very much information. :-\ > > Your first mails make it seem like MTT is continuing to run, but leaving > "launchers" (assumedly mpirun processes) still running, but they have no > children. Which would be very weird for mpirun to do, if it has no children > left. This could be both an MTT and an ORTE bug, in this case. > > But your last mail seems to imply that MTT is hanging indefinitely. > > Can you please provide a clear, precise description of what is happening? > > FWIW: Yes, we are killing the parent first now, to give mpirun a chance to > cleanup / tell remote orteds to die / kill children processes / etc. Killing > the children first both doesn't test the common case of how people kill MPI > processes (i.e., they kill mpirun), and it also doesn't allow mpirun to tell > remote processes to die. > > Do you run with --verbose output? MTT should output messages like "*** > Killing mpirun with SIGTERM", and the like. Do you see timeout messages at > all? I.e., is MTT not entering the timeout code at all? > > ...etc. > > > > On Jun 23, 2014, at 12:16 PM, Dave Goodell (dgoodell) <[email protected]> > wrote: > > > On Jun 23, 2014, at 8:48 AM, Mike Dubman <[email protected]> wrote: > > > >> btw, i think now, when parent process is killed before child, OS makes > >> child as "<defunct>" which stick around for good. > > > > The grandparent should inherit the child. If the grandparent then does not > > wait(2) on the child, then the child will remain a zombie / defunct. So in > > our specific case, this behavior will depend on what the parent process of > > mpirun is and whether it is waiting on child processes appropriately. > > > > -Dave > > > > _______________________________________________ > > mtt-devel mailing list > > [email protected] > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel > > Link to this post: > > http://www.open-mpi.org/community/lists/mtt-devel/2014/06/0633.php > > > -- > Jeff Squyres > [email protected] > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > _______________________________________________ > mtt-devel mailing list > [email protected] > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel > Link to this post: > http://www.open-mpi.org/community/lists/mtt-devel/2014/06/0634.php > > _______________________________________________ > mtt-devel mailing list > [email protected] > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel > Link to this post: > http://www.open-mpi.org/community/lists/mtt-devel/2014/06/0637.php -- Jeff Squyres [email protected] For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
