Hi
sorry for incomplete description. will trace problem more closely later
next week and provide.

M


On Mon, Jun 23, 2014 at 10:13 PM, Jeff Squyres (jsquyres) <
jsquy...@cisco.com> wrote:

> Ok, just got in to Chicago from my flight and am back online.
>
> Mike: you are still not providing very much information.  :-\
>
> Your first mails make it seem like MTT is continuing to run, but leaving
> "launchers" (assumedly mpirun processes) still running, but they have no
> children.  Which would be very weird for mpirun to do, if it has no
> children left.  This could be both an MTT and an ORTE bug, in this case.
>
> But your last mail seems to imply that MTT is hanging indefinitely.
>
> Can you please provide a clear, precise description of what is happening?
>
> FWIW: Yes, we are killing the parent first now, to give mpirun a chance to
> cleanup / tell remote orteds to die / kill children processes / etc.
>  Killing the children first both doesn't test the common case of how people
> kill MPI processes (i.e., they kill mpirun), and it also doesn't allow
> mpirun to tell remote processes to die.
>
> Do you run with --verbose output?  MTT should output messages like "***
> Killing mpirun with SIGTERM", and the like.  Do you see timeout messages at
> all?  I.e., is MTT not entering the timeout code at all?
>
> ...etc.
>
>
>
> On Jun 23, 2014, at 12:16 PM, Dave Goodell (dgoodell) <dgood...@cisco.com>
> wrote:
>
> > On Jun 23, 2014, at 8:48 AM, Mike Dubman <mi...@dev.mellanox.co.il>
> wrote:
> >
> >> btw, i think now, when parent process is killed before child, OS makes
> child as "<defunct>" which stick around for good.
> >
> > The grandparent should inherit the child.  If the grandparent then does
> not wait(2) on the child, then the child will remain a zombie / defunct.
>  So in our specific case, this behavior will depend on what the parent
> process of mpirun is and whether it is waiting on child processes
> appropriately.
> >
> > -Dave
> >
> > _______________________________________________
> > mtt-devel mailing list
> > mtt-de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/mtt-devel/2014/06/0633.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> _______________________________________________
> mtt-devel mailing list
> mtt-de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel
> Link to this post:
> http://www.open-mpi.org/community/lists/mtt-devel/2014/06/0634.php
>

Reply via email to