It’s probably a race condition caused by uniting the ORTE and OPAL async threads, though I can’t confirm that yet.
> On Jul 17, 2015, at 3:11 AM, Gilles Gouaillardet > <gilles.gouaillar...@gmail.com> wrote: > > Folks, > > I noticed several errors such as > http://mtt.open-mpi.org/index.php?do_redir=2244 > <http://mtt.open-mpi.org/index.php?do_redir=2244> > that did not make any sense to me (at first glance) > > I was able to attach one process when the issue occurs. > the sigsegv occurs in thread 2, while thread 1 is invoking ompi_rte_finalize. > > All I can think is a scenario in which the progress thread (aka thread 2) is > still dealing with some memory that was just freed/unmapped/corrupted by the > main thread. > > I empirically noticed the error is more likely to occur when there are many > tasks on one node > e.g. mpirun --oversubscribe -np 32 a.out > > Cheers, > > Gilles > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/07/17652.php