Edgar Do you have a simple test we could run with jenkins ghprb that would catch this going forward?
I could add it to some of the checks we run on your UH slave node. Howard ---------- sent from my smart phonr so no good type. Howard On Sep 16, 2015 12:36 PM, "Nathan Hjelm" <hje...@lanl.gov> wrote: > > I see the problem. Before my changes ompi_comm_dup signalled that the > communicator was not an inter-communicator by setting remote_size to > 0. The remote size is now from the remote group if one was supplied > (which is the case with intra-communicators) so ompi_comm_dup needs to > make sure NULL is passed for the remote_group when duplicating > intra-communicators. > > I opened a PR. Once jenkins finishes I will merge it onto master. > > -Nathan > > On Wed, Sep 16, 2015 at 10:43:45AM -0500, Edgar Gabriel wrote: > > yes, I did fresh pull this morning, for me it deadlocks reliably for 2 > and > > more processes. > > > > Thanks > > Edgar > > > > On 9/16/2015 10:42 AM, Nathan Hjelm wrote: > > > > > >The reproducer is working for me with master on OX 10.10. Some changes > > >to ompi_comm_set went in yesterday. Are you on the latest hash? > > > > > >-Nathan > > > > > >On Wed, Sep 16, 2015 at 08:49:59AM -0500, Edgar Gabriel wrote: > > >>something is borked right now on master in the management of inter vs. > intra > > >>communicators. It looks like intra communicators are wrongly selecting > the > > >>inter coll module thinking that it is an inter communicator, and we > have > > >>hangs because of that. I attach a small replicator, where a bcast of a > > >>duplicate of MPI_COMM_WORLD hangs, because the inter collective module > is > > >>being selected. > > >> > > >>Thanks > > >>Edgar > > > > > >>#include <stdio.h> > > >>#include "mpi.h" > > >> > > >>int main( int argc, char *argv[] ) > > >>{ > > >> MPI_Comm comm1; > > >> int root=0; > > >> int rank2, size2, global_buf=1; > > >> int rank, size; > > >> > > >> MPI_Init ( &argc, &argv ); > > >> > > >> MPI_Comm_rank ( MPI_COMM_WORLD, &rank ); > > >> MPI_Comm_size ( MPI_COMM_WORLD, &size ); > > >> > > >>/* Setting up a new communicator */ > > >> MPI_Comm_dup ( MPI_COMM_WORLD, &comm1 ); > > >> > > >> MPI_Comm_size ( comm1, &size2 ); > > >> MPI_Comm_rank ( comm1, &rank2 ); > > >> > > >> > > >> MPI_Bcast ( &global_buf, 1, MPI_INT, root, MPI_COMM_WORLD ); > > >> if ( rank == root ) { > > >> printf("Bcast on MPI_COMM_WORLD finished\n"); > > >> } > > >> MPI_Bcast ( &global_buf, 1, MPI_INT, root, comm1 ); > > >> if ( rank == root ) { > > >> printf("Bcast on duplicate of MPI_COMM_WORLD finished\n"); > > >> } > > >> > > >> MPI_Comm_free ( &comm1 ); > > >> > > >> MPI_Finalize (); > > >> return ( 0 ); > > >>} > > > > > >>_______________________________________________ > > >>devel mailing list > > >>de...@open-mpi.org > > >>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > >>Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/09/18040.php > > > > > > > > > > > >_______________________________________________ > > >devel mailing list > > >de...@open-mpi.org > > >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > >Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/09/18042.php > > > > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/09/18043.php > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/09/18049.php >