thanks Ralph.  I will add it to one of the UH jenkins scripts.

----------

sent from my smart phonr so no good type.

Howard
On Sep 16, 2015 10:28 PM, "Ralph Castain" <r...@open-mpi.org> wrote:

> Actually, Edgar attached a simple reproducer to the first message in this
> thread.
>
>
> On Wed, Sep 16, 2015 at 7:27 PM, Howard Pritchard <hpprit...@gmail.com>
> wrote:
>
>> Edgar
>>
>> Do you have a simple test we could run with jenkins ghprb that would
>> catch this going forward?
>>
>> I could add it to some of the checks we run on your UH slave node.
>>
>> Howard
>>
>> ----------
>>
>> sent from my smart phonr so no good type.
>>
>> Howard
>> On Sep 16, 2015 12:36 PM, "Nathan Hjelm" <hje...@lanl.gov> wrote:
>>
>>>
>>> I see the problem. Before my changes ompi_comm_dup signalled that the
>>> communicator was not an inter-communicator by setting remote_size to
>>> 0. The remote size is now from the remote group if one was supplied
>>> (which is the case with intra-communicators) so ompi_comm_dup needs to
>>> make sure NULL is passed for the remote_group when duplicating
>>> intra-communicators.
>>>
>>> I opened a PR. Once jenkins finishes I will merge it onto master.
>>>
>>> -Nathan
>>>
>>> On Wed, Sep 16, 2015 at 10:43:45AM -0500, Edgar Gabriel wrote:
>>> > yes, I did fresh pull this morning, for me it deadlocks reliably for 2
>>> and
>>> > more processes.
>>> >
>>> > Thanks
>>> > Edgar
>>> >
>>> > On 9/16/2015 10:42 AM, Nathan Hjelm wrote:
>>> > >
>>> > >The reproducer is working for me with master on OX 10.10. Some changes
>>> > >to ompi_comm_set went in yesterday. Are you on the latest hash?
>>> > >
>>> > >-Nathan
>>> > >
>>> > >On Wed, Sep 16, 2015 at 08:49:59AM -0500, Edgar Gabriel wrote:
>>> > >>something is borked right now on master in the management of inter
>>> vs. intra
>>> > >>communicators. It looks like intra communicators are wrongly
>>> selecting the
>>> > >>inter coll module thinking that it is an inter communicator, and we
>>> have
>>> > >>hangs because of that. I attach a small replicator, where a bcast of
>>> a
>>> > >>duplicate of MPI_COMM_WORLD hangs, because the inter collective
>>> module is
>>> > >>being selected.
>>> > >>
>>> > >>Thanks
>>> > >>Edgar
>>> > >
>>> > >>#include <stdio.h>
>>> > >>#include "mpi.h"
>>> > >>
>>> > >>int main( int argc, char *argv[] )
>>> > >>{
>>> > >>   MPI_Comm comm1;
>>> > >>   int root=0;
>>> > >>   int rank2, size2, global_buf=1;
>>> > >>   int rank, size;
>>> > >>
>>> > >>   MPI_Init ( &argc, &argv );
>>> > >>
>>> > >>   MPI_Comm_rank ( MPI_COMM_WORLD, &rank );
>>> > >>   MPI_Comm_size ( MPI_COMM_WORLD, &size );
>>> > >>
>>> > >>/* Setting up a new communicator */
>>> > >>   MPI_Comm_dup ( MPI_COMM_WORLD, &comm1 );
>>> > >>
>>> > >>   MPI_Comm_size ( comm1, &size2 );
>>> > >>   MPI_Comm_rank ( comm1, &rank2 );
>>> > >>
>>> > >>
>>> > >>   MPI_Bcast ( &global_buf, 1, MPI_INT, root, MPI_COMM_WORLD );
>>> > >>   if ( rank == root ) {
>>> > >>       printf("Bcast on MPI_COMM_WORLD finished\n");
>>> > >>   }
>>> > >>   MPI_Bcast ( &global_buf, 1, MPI_INT, root, comm1 );
>>> > >>   if ( rank == root ) {
>>> > >>       printf("Bcast on duplicate of MPI_COMM_WORLD finished\n");
>>> > >>   }
>>> > >>
>>> > >>   MPI_Comm_free ( &comm1 );
>>> > >>
>>> > >>   MPI_Finalize ();
>>> > >>   return ( 0 );
>>> > >>}
>>> > >
>>> > >>_______________________________________________
>>> > >>devel mailing list
>>> > >>de...@open-mpi.org
>>> > >>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> > >>Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2015/09/18040.php
>>> > >
>>> > >
>>> > >
>>> > >_______________________________________________
>>> > >devel mailing list
>>> > >de...@open-mpi.org
>>> > >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> > >Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2015/09/18042.php
>>> > >
>>> > _______________________________________________
>>> > devel mailing list
>>> > de...@open-mpi.org
>>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> > Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2015/09/18043.php
>>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2015/09/18049.php
>>>
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/09/18057.php
>>
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/09/18059.php
>

Reply via email to