Ralph, i could not find anything wrong with loop_spawn and unless i am missing something obvious :
from mtt http://mtt.open-mpi.org/index.php?do_redir=2196 all tests ran this month (both trunk and v1.8) failed (timeout) and there was no error message such as dpm_base_disconnect_init: error -12 in isend to process 1 loop_spawn tries to spawn 2000 tasks in 10 minutes. my system is not fast enough to achieve this so the iteration count is bumped /* if time exceeded, then bump iteration count to the end */ the test would success in 10 minutes and a few seconds ( required to complete the last spawn and MPI_Finalize()) the slurm timeout is set to 10 minutes exactly, so the job is aborted before it has time to finish (and i believe it would have finished successfully) you can either increase the slurm timeout (10min30s looks good to me), decrease nseconds (570 looks good to me) in loop_spawn.c or run mpirun ... dynamic/loop_spawn <nseconds> where nseconds is "a bit less" than 600 seconds (once again, 570 looks good to me) did i miss something ? Cheers, Gilles On Wed, May 28, 2014 at 12:53 PM, Gilles Gouaillardet < gilles.gouaillar...@iferc.org> wrote: > Ralph, > > > On 2014/05/28 12:10, Ralph Castain wrote: > > my understanding is that there are two ways of seeing things : > > a) the "R-way" : the problem is the parent should not try to communicate > to already exited processes > > b) the "J-way" : the problem is the children should have waited either > in MPI_Comm_free() or MPI_Finalize() > > I don't think you can use option (b) - we can't have the children > lingering around for the parent to call finalize, if I'm understanding you > correctly. > you understood me correctly. > > once again, i did not start investigating loop_spawn. > > in the case of intercomm_create, we would not run into this if the > application had explicitly called MPI_Comm_free in the parent. > so in this case *only*, and as explained by Jeff, b) could be an option > to make OpenMPI happy. > (to be blunt : if the user is not happy with children lingering around, > he can explicitly call MPI_Comm_free before calling MPI_Comm_disconnect) > > i will start investigating loop_spawn from now > > Cheers, > > Gilles > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/05/14879.php >