On Dec 17 2010, Suraj Prabhakaran wrote:

I am observing a behavior where when the parent spawns a child and when the child terminates abruptly (for example with exit() before MPI_Finalize() ), the parent also terminates even after both the child and parent have explicitly called a MPI_disconnect. This turns out to be a disadvantage. ...

Indeed.  But that is what will sometimes happen, and it's not primarily
an OpenMPI issue - though clearly OpenMPI should try to avoid it when
possible.  It is what happens under some circumstances under some systems.
You really don't want to know why, I assure you :-(  The root cause is
a combination of shoddy interface design and too many programs being too
clever by half.

The following is key information to provide:

   The name and precise variants of the operating system, compilers
and any libraries used for both parent AND child.

    Whether the MPI was being run under a batch scheduler or similar
controlling application and, if so, the precise variant of that.

   The way in which the child failed (e.g. the signal number AND how
that signal was generated).  If you are sure that it happens with a
plain exit(), you have answered this one already.

  And, heaven help us all, sometimes the operating system, compiler,
library and controller configuration, precise environment that the
MPI program was running under.  Sometimes even other actions of the child
can matter.

Finding the last needs considerable expertise, even for an experienced
administrator, so start with the first three.  All of them are critical
to this issue, unfortunately.


Regards,
Nick Maclaren.


Reply via email to