On Dec 17 2010, Suraj Prabhakaran wrote:
I am observing a behavior where when the parent spawns a child and when the child terminates abruptly (for example with exit() before MPI_Finalize() ), the parent also terminates even after both the child and parent have explicitly called a MPI_disconnect. This turns out to be a disadvantage. ...
Indeed. But that is what will sometimes happen, and it's not primarily an OpenMPI issue - though clearly OpenMPI should try to avoid it when possible. It is what happens under some circumstances under some systems. You really don't want to know why, I assure you :-( The root cause is a combination of shoddy interface design and too many programs being too clever by half. The following is key information to provide: The name and precise variants of the operating system, compilers and any libraries used for both parent AND child. Whether the MPI was being run under a batch scheduler or similar controlling application and, if so, the precise variant of that. The way in which the child failed (e.g. the signal number AND how that signal was generated). If you are sure that it happens with a plain exit(), you have answered this one already. And, heaven help us all, sometimes the operating system, compiler, library and controller configuration, precise environment that the MPI program was running under. Sometimes even other actions of the child can matter. Finding the last needs considerable expertise, even for an experienced administrator, so start with the first three. All of them are critical to this issue, unfortunately. Regards, Nick Maclaren.