thanks Ralph. I will add it to one of the UH jenkins scripts. ----------
sent from my smart phonr so no good type. Howard On Sep 16, 2015 10:28 PM, "Ralph Castain" <r...@open-mpi.org> wrote: > Actually, Edgar attached a simple reproducer to the first message in this > thread. > > > On Wed, Sep 16, 2015 at 7:27 PM, Howard Pritchard <hpprit...@gmail.com> > wrote: > >> Edgar >> >> Do you have a simple test we could run with jenkins ghprb that would >> catch this going forward? >> >> I could add it to some of the checks we run on your UH slave node. >> >> Howard >> >> ---------- >> >> sent from my smart phonr so no good type. >> >> Howard >> On Sep 16, 2015 12:36 PM, "Nathan Hjelm" <hje...@lanl.gov> wrote: >> >>> >>> I see the problem. Before my changes ompi_comm_dup signalled that the >>> communicator was not an inter-communicator by setting remote_size to >>> 0. The remote size is now from the remote group if one was supplied >>> (which is the case with intra-communicators) so ompi_comm_dup needs to >>> make sure NULL is passed for the remote_group when duplicating >>> intra-communicators. >>> >>> I opened a PR. Once jenkins finishes I will merge it onto master. >>> >>> -Nathan >>> >>> On Wed, Sep 16, 2015 at 10:43:45AM -0500, Edgar Gabriel wrote: >>> > yes, I did fresh pull this morning, for me it deadlocks reliably for 2 >>> and >>> > more processes. >>> > >>> > Thanks >>> > Edgar >>> > >>> > On 9/16/2015 10:42 AM, Nathan Hjelm wrote: >>> > > >>> > >The reproducer is working for me with master on OX 10.10. Some changes >>> > >to ompi_comm_set went in yesterday. Are you on the latest hash? >>> > > >>> > >-Nathan >>> > > >>> > >On Wed, Sep 16, 2015 at 08:49:59AM -0500, Edgar Gabriel wrote: >>> > >>something is borked right now on master in the management of inter >>> vs. intra >>> > >>communicators. It looks like intra communicators are wrongly >>> selecting the >>> > >>inter coll module thinking that it is an inter communicator, and we >>> have >>> > >>hangs because of that. I attach a small replicator, where a bcast of >>> a >>> > >>duplicate of MPI_COMM_WORLD hangs, because the inter collective >>> module is >>> > >>being selected. >>> > >> >>> > >>Thanks >>> > >>Edgar >>> > > >>> > >>#include <stdio.h> >>> > >>#include "mpi.h" >>> > >> >>> > >>int main( int argc, char *argv[] ) >>> > >>{ >>> > >> MPI_Comm comm1; >>> > >> int root=0; >>> > >> int rank2, size2, global_buf=1; >>> > >> int rank, size; >>> > >> >>> > >> MPI_Init ( &argc, &argv ); >>> > >> >>> > >> MPI_Comm_rank ( MPI_COMM_WORLD, &rank ); >>> > >> MPI_Comm_size ( MPI_COMM_WORLD, &size ); >>> > >> >>> > >>/* Setting up a new communicator */ >>> > >> MPI_Comm_dup ( MPI_COMM_WORLD, &comm1 ); >>> > >> >>> > >> MPI_Comm_size ( comm1, &size2 ); >>> > >> MPI_Comm_rank ( comm1, &rank2 ); >>> > >> >>> > >> >>> > >> MPI_Bcast ( &global_buf, 1, MPI_INT, root, MPI_COMM_WORLD ); >>> > >> if ( rank == root ) { >>> > >> printf("Bcast on MPI_COMM_WORLD finished\n"); >>> > >> } >>> > >> MPI_Bcast ( &global_buf, 1, MPI_INT, root, comm1 ); >>> > >> if ( rank == root ) { >>> > >> printf("Bcast on duplicate of MPI_COMM_WORLD finished\n"); >>> > >> } >>> > >> >>> > >> MPI_Comm_free ( &comm1 ); >>> > >> >>> > >> MPI_Finalize (); >>> > >> return ( 0 ); >>> > >>} >>> > > >>> > >>_______________________________________________ >>> > >>devel mailing list >>> > >>de...@open-mpi.org >>> > >>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> > >>Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2015/09/18040.php >>> > > >>> > > >>> > > >>> > >_______________________________________________ >>> > >devel mailing list >>> > >de...@open-mpi.org >>> > >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> > >Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2015/09/18042.php >>> > > >>> > _______________________________________________ >>> > devel mailing list >>> > de...@open-mpi.org >>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> > Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2015/09/18043.php >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2015/09/18049.php >>> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2015/09/18057.php >> > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/09/18059.php >