Actually, Edgar attached a simple reproducer to the first message in this
thread.


On Wed, Sep 16, 2015 at 7:27 PM, Howard Pritchard <hpprit...@gmail.com>
wrote:

> Edgar
>
> Do you have a simple test we could run with jenkins ghprb that would catch
> this going forward?
>
> I could add it to some of the checks we run on your UH slave node.
>
> Howard
>
> ----------
>
> sent from my smart phonr so no good type.
>
> Howard
> On Sep 16, 2015 12:36 PM, "Nathan Hjelm" <hje...@lanl.gov> wrote:
>
>>
>> I see the problem. Before my changes ompi_comm_dup signalled that the
>> communicator was not an inter-communicator by setting remote_size to
>> 0. The remote size is now from the remote group if one was supplied
>> (which is the case with intra-communicators) so ompi_comm_dup needs to
>> make sure NULL is passed for the remote_group when duplicating
>> intra-communicators.
>>
>> I opened a PR. Once jenkins finishes I will merge it onto master.
>>
>> -Nathan
>>
>> On Wed, Sep 16, 2015 at 10:43:45AM -0500, Edgar Gabriel wrote:
>> > yes, I did fresh pull this morning, for me it deadlocks reliably for 2
>> and
>> > more processes.
>> >
>> > Thanks
>> > Edgar
>> >
>> > On 9/16/2015 10:42 AM, Nathan Hjelm wrote:
>> > >
>> > >The reproducer is working for me with master on OX 10.10. Some changes
>> > >to ompi_comm_set went in yesterday. Are you on the latest hash?
>> > >
>> > >-Nathan
>> > >
>> > >On Wed, Sep 16, 2015 at 08:49:59AM -0500, Edgar Gabriel wrote:
>> > >>something is borked right now on master in the management of inter
>> vs. intra
>> > >>communicators. It looks like intra communicators are wrongly
>> selecting the
>> > >>inter coll module thinking that it is an inter communicator, and we
>> have
>> > >>hangs because of that. I attach a small replicator, where a bcast of a
>> > >>duplicate of MPI_COMM_WORLD hangs, because the inter collective
>> module is
>> > >>being selected.
>> > >>
>> > >>Thanks
>> > >>Edgar
>> > >
>> > >>#include <stdio.h>
>> > >>#include "mpi.h"
>> > >>
>> > >>int main( int argc, char *argv[] )
>> > >>{
>> > >>   MPI_Comm comm1;
>> > >>   int root=0;
>> > >>   int rank2, size2, global_buf=1;
>> > >>   int rank, size;
>> > >>
>> > >>   MPI_Init ( &argc, &argv );
>> > >>
>> > >>   MPI_Comm_rank ( MPI_COMM_WORLD, &rank );
>> > >>   MPI_Comm_size ( MPI_COMM_WORLD, &size );
>> > >>
>> > >>/* Setting up a new communicator */
>> > >>   MPI_Comm_dup ( MPI_COMM_WORLD, &comm1 );
>> > >>
>> > >>   MPI_Comm_size ( comm1, &size2 );
>> > >>   MPI_Comm_rank ( comm1, &rank2 );
>> > >>
>> > >>
>> > >>   MPI_Bcast ( &global_buf, 1, MPI_INT, root, MPI_COMM_WORLD );
>> > >>   if ( rank == root ) {
>> > >>       printf("Bcast on MPI_COMM_WORLD finished\n");
>> > >>   }
>> > >>   MPI_Bcast ( &global_buf, 1, MPI_INT, root, comm1 );
>> > >>   if ( rank == root ) {
>> > >>       printf("Bcast on duplicate of MPI_COMM_WORLD finished\n");
>> > >>   }
>> > >>
>> > >>   MPI_Comm_free ( &comm1 );
>> > >>
>> > >>   MPI_Finalize ();
>> > >>   return ( 0 );
>> > >>}
>> > >
>> > >>_______________________________________________
>> > >>devel mailing list
>> > >>de...@open-mpi.org
>> > >>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> > >>Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/09/18040.php
>> > >
>> > >
>> > >
>> > >_______________________________________________
>> > >devel mailing list
>> > >de...@open-mpi.org
>> > >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> > >Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/09/18042.php
>> > >
>> > _______________________________________________
>> > devel mailing list
>> > de...@open-mpi.org
>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> > Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/09/18043.php
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/09/18049.php
>>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/09/18057.php
>

Reply via email to