Actually, Edgar attached a simple reproducer to the first message in this thread.
On Wed, Sep 16, 2015 at 7:27 PM, Howard Pritchard <hpprit...@gmail.com> wrote: > Edgar > > Do you have a simple test we could run with jenkins ghprb that would catch > this going forward? > > I could add it to some of the checks we run on your UH slave node. > > Howard > > ---------- > > sent from my smart phonr so no good type. > > Howard > On Sep 16, 2015 12:36 PM, "Nathan Hjelm" <hje...@lanl.gov> wrote: > >> >> I see the problem. Before my changes ompi_comm_dup signalled that the >> communicator was not an inter-communicator by setting remote_size to >> 0. The remote size is now from the remote group if one was supplied >> (which is the case with intra-communicators) so ompi_comm_dup needs to >> make sure NULL is passed for the remote_group when duplicating >> intra-communicators. >> >> I opened a PR. Once jenkins finishes I will merge it onto master. >> >> -Nathan >> >> On Wed, Sep 16, 2015 at 10:43:45AM -0500, Edgar Gabriel wrote: >> > yes, I did fresh pull this morning, for me it deadlocks reliably for 2 >> and >> > more processes. >> > >> > Thanks >> > Edgar >> > >> > On 9/16/2015 10:42 AM, Nathan Hjelm wrote: >> > > >> > >The reproducer is working for me with master on OX 10.10. Some changes >> > >to ompi_comm_set went in yesterday. Are you on the latest hash? >> > > >> > >-Nathan >> > > >> > >On Wed, Sep 16, 2015 at 08:49:59AM -0500, Edgar Gabriel wrote: >> > >>something is borked right now on master in the management of inter >> vs. intra >> > >>communicators. It looks like intra communicators are wrongly >> selecting the >> > >>inter coll module thinking that it is an inter communicator, and we >> have >> > >>hangs because of that. I attach a small replicator, where a bcast of a >> > >>duplicate of MPI_COMM_WORLD hangs, because the inter collective >> module is >> > >>being selected. >> > >> >> > >>Thanks >> > >>Edgar >> > > >> > >>#include <stdio.h> >> > >>#include "mpi.h" >> > >> >> > >>int main( int argc, char *argv[] ) >> > >>{ >> > >> MPI_Comm comm1; >> > >> int root=0; >> > >> int rank2, size2, global_buf=1; >> > >> int rank, size; >> > >> >> > >> MPI_Init ( &argc, &argv ); >> > >> >> > >> MPI_Comm_rank ( MPI_COMM_WORLD, &rank ); >> > >> MPI_Comm_size ( MPI_COMM_WORLD, &size ); >> > >> >> > >>/* Setting up a new communicator */ >> > >> MPI_Comm_dup ( MPI_COMM_WORLD, &comm1 ); >> > >> >> > >> MPI_Comm_size ( comm1, &size2 ); >> > >> MPI_Comm_rank ( comm1, &rank2 ); >> > >> >> > >> >> > >> MPI_Bcast ( &global_buf, 1, MPI_INT, root, MPI_COMM_WORLD ); >> > >> if ( rank == root ) { >> > >> printf("Bcast on MPI_COMM_WORLD finished\n"); >> > >> } >> > >> MPI_Bcast ( &global_buf, 1, MPI_INT, root, comm1 ); >> > >> if ( rank == root ) { >> > >> printf("Bcast on duplicate of MPI_COMM_WORLD finished\n"); >> > >> } >> > >> >> > >> MPI_Comm_free ( &comm1 ); >> > >> >> > >> MPI_Finalize (); >> > >> return ( 0 ); >> > >>} >> > > >> > >>_______________________________________________ >> > >>devel mailing list >> > >>de...@open-mpi.org >> > >>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > >>Link to this post: >> http://www.open-mpi.org/community/lists/devel/2015/09/18040.php >> > > >> > > >> > > >> > >_______________________________________________ >> > >devel mailing list >> > >de...@open-mpi.org >> > >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > >Link to this post: >> http://www.open-mpi.org/community/lists/devel/2015/09/18042.php >> > > >> > _______________________________________________ >> > devel mailing list >> > de...@open-mpi.org >> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > Link to this post: >> http://www.open-mpi.org/community/lists/devel/2015/09/18043.php >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2015/09/18049.php >> > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/09/18057.php >