Edgar

Do you have a simple test we could run with jenkins ghprb that would catch
this going forward?

I could add it to some of the checks we run on your UH slave node.

Howard

----------

sent from my smart phonr so no good type.

Howard
On Sep 16, 2015 12:36 PM, "Nathan Hjelm" <hje...@lanl.gov> wrote:

>
> I see the problem. Before my changes ompi_comm_dup signalled that the
> communicator was not an inter-communicator by setting remote_size to
> 0. The remote size is now from the remote group if one was supplied
> (which is the case with intra-communicators) so ompi_comm_dup needs to
> make sure NULL is passed for the remote_group when duplicating
> intra-communicators.
>
> I opened a PR. Once jenkins finishes I will merge it onto master.
>
> -Nathan
>
> On Wed, Sep 16, 2015 at 10:43:45AM -0500, Edgar Gabriel wrote:
> > yes, I did fresh pull this morning, for me it deadlocks reliably for 2
> and
> > more processes.
> >
> > Thanks
> > Edgar
> >
> > On 9/16/2015 10:42 AM, Nathan Hjelm wrote:
> > >
> > >The reproducer is working for me with master on OX 10.10. Some changes
> > >to ompi_comm_set went in yesterday. Are you on the latest hash?
> > >
> > >-Nathan
> > >
> > >On Wed, Sep 16, 2015 at 08:49:59AM -0500, Edgar Gabriel wrote:
> > >>something is borked right now on master in the management of inter vs.
> intra
> > >>communicators. It looks like intra communicators are wrongly selecting
> the
> > >>inter coll module thinking that it is an inter communicator, and we
> have
> > >>hangs because of that. I attach a small replicator, where a bcast of a
> > >>duplicate of MPI_COMM_WORLD hangs, because the inter collective module
> is
> > >>being selected.
> > >>
> > >>Thanks
> > >>Edgar
> > >
> > >>#include <stdio.h>
> > >>#include "mpi.h"
> > >>
> > >>int main( int argc, char *argv[] )
> > >>{
> > >>   MPI_Comm comm1;
> > >>   int root=0;
> > >>   int rank2, size2, global_buf=1;
> > >>   int rank, size;
> > >>
> > >>   MPI_Init ( &argc, &argv );
> > >>
> > >>   MPI_Comm_rank ( MPI_COMM_WORLD, &rank );
> > >>   MPI_Comm_size ( MPI_COMM_WORLD, &size );
> > >>
> > >>/* Setting up a new communicator */
> > >>   MPI_Comm_dup ( MPI_COMM_WORLD, &comm1 );
> > >>
> > >>   MPI_Comm_size ( comm1, &size2 );
> > >>   MPI_Comm_rank ( comm1, &rank2 );
> > >>
> > >>
> > >>   MPI_Bcast ( &global_buf, 1, MPI_INT, root, MPI_COMM_WORLD );
> > >>   if ( rank == root ) {
> > >>       printf("Bcast on MPI_COMM_WORLD finished\n");
> > >>   }
> > >>   MPI_Bcast ( &global_buf, 1, MPI_INT, root, comm1 );
> > >>   if ( rank == root ) {
> > >>       printf("Bcast on duplicate of MPI_COMM_WORLD finished\n");
> > >>   }
> > >>
> > >>   MPI_Comm_free ( &comm1 );
> > >>
> > >>   MPI_Finalize ();
> > >>   return ( 0 );
> > >>}
> > >
> > >>_______________________________________________
> > >>devel mailing list
> > >>de...@open-mpi.org
> > >>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > >>Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/09/18040.php
> > >
> > >
> > >
> > >_______________________________________________
> > >devel mailing list
> > >de...@open-mpi.org
> > >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > >Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/09/18042.php
> > >
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/09/18043.php
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/09/18049.php
>

Reply via email to