Nathan?

On Oct 1, 2014, at 10:27 AM, George Bosilca <bosi...@icl.utk.edu> wrote:

> I see no change in the topo interface in any of the patches attached to this 
> thread. Is there any other patch related to this discussion?
> 
>  George.
> 
> 
> 
>> On Oct 1, 2014, at 14:52, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote:
>> 
>>> On Oct 1, 2014, at 6:48 AM, Gilles Gouaillardet 
>>> <gilles.gouaillar...@iferc.org> wrote:
>>> 
>>> oops, i missed graph and dist graph :-(
>>> 
>>> the attached patch fixes this
>>> /* it was straightforward for graph and a bit more challenging for dist
>>> graph */
>>> 
>>> i am fine for Nathan's patch for v1.8
>>> /* v1.8 and trunk have already a different way to handle a memory leak */
>>> /* the attached patch is vs v1.8 but it is for review only, i will port
>>> it to the trunk */
>>> 
>>> imho, there is no ABI break with this patch.
>>> out of curiosity, what is the ABI break that is/will be introduced with
>>> the "real" fix ?
>> 
>> I haven't looked at the code at all, but Nathan said it changed the topo 
>> framework interface.  That's the ABI I was referring to -- not the MPI ABI.  
>> Sorry for the confusions.
>> 
>>> FYI, i also enhanced the collective/neighbor_allgather_self from the ibm
>>> test suite in order to test
>>> graph and dist_graph
>>> 
>>> Cheers,
>>> 
>>> Gilles
>>> 
>>> 
>>>> On 2014/10/01 0:58, Jeff Squyres (jsquyres) wrote:
>>>> On the call today, we decided that Nathan's v1.8 patch is sufficient for 
>>>> the v1.8 series, and the "real" fix will be applied to the trunk and 
>>>> carried forward to the v1.9 series (since it introduces an ABI break for 
>>>> the topo framework, which we try not to do in the middle of a release 
>>>> series).
>>>> 
>>>> On Sep 30, 2014, at 10:40 AM, Nathan Hjelm <hje...@lanl.gov> wrote:
>>>> 
>>>>> Not quite right. There still is no topology information at collective
>>>>> selection time for either graph or dist graph.
>>>>> 
>>>>> -Nathan
>>>>> 
>>>>>> On Tue, Sep 30, 2014 at 02:03:27PM +0900, Gilles Gouaillardet wrote:
>>>>>> Nathan,
>>>>>> 
>>>>>> here is a revision of the previously attached patch, and that supports
>>>>>> graph and dist graph.
>>>>>> 
>>>>>> Cheers,
>>>>>> 
>>>>>> Gilles
>>>>>> 
>>>>>> On 2014/09/30 0:05, Nathan Hjelm wrote:
>>>>>> 
>>>>>> An equivalent change would need to be made for graph and dist graph as
>>>>>> well. That will take a little more work. Also, I was avoiding changing
>>>>>> anything in topo for 1.8.
>>>>>> 
>>>>>> -Nathan
>>>>>> 
>>>>>> On Mon, Sep 29, 2014 at 08:02:41PM +0900, Gilles Gouaillardet wrote:
>>>>>> 
>>>>>> Nathan,
>>>>>> 
>>>>>> why not just make the topology information available at that point as you
>>>>>> described it ?
>>>>>> 
>>>>>> the attached patch does this, could you please review it ?
>>>>>> 
>>>>>> Cheers,
>>>>>> 
>>>>>> Gilles
>>>>>> 
>>>>>> On 2014/09/26 2:50, Nathan Hjelm wrote:
>>>>>> 
>>>>>> On Tue, Aug 26, 2014 at 07:03:24PM +0300, Lisandro Dalcin wrote:
>>>>>> 
>>>>>> I finally managed to track down some issues in mpi4py's test suite
>>>>>> using Open MPI 1.8+. The code below should be enough to reproduce the
>>>>>> problem. Run it under valgrind to make sense of my following
>>>>>> diagnostics.
>>>>>> 
>>>>>> In this code I'm creating a 2D, periodic Cartesian topology out of
>>>>>> COMM_SELF. In this case, the process in COMM_SELF has 4 logical in/out
>>>>>> links to itself. So we have size=1 but indegree=outdegree=4. However,
>>>>>> in ompi/mca/coll/basic/coll_basic_module.c, "size * 2" request are
>>>>>> being allocated to manage communication:
>>>>>> 
>>>>>>   if (OMPI_COMM_IS_INTER(comm)) {
>>>>>>       size = ompi_comm_remote_size(comm);
>>>>>>   } else {
>>>>>>       size = ompi_comm_size(comm);
>>>>>>   }
>>>>>>   basic_module->mccb_num_reqs = size * 2;
>>>>>>   basic_module->mccb_reqs = (ompi_request_t**)
>>>>>>       malloc(sizeof(ompi_request_t *) * basic_module->mccb_num_reqs);
>>>>>> 
>>>>>> I guess you have to also special-case for topologies and allocate
>>>>>> indegree+outdegree requests (not sure about this number, just
>>>>>> guessing).
>>>>>> 
>>>>>> 
>>>>>> I wish this was possible but the topology information is not available
>>>>>> at that point. We may be able to change that but I don't see the work
>>>>>> completing anytime soon. I committed an alternative fix as r32796 and
>>>>>> CMR'd it to 1.8.3. I can confirm that the attached reproducer no longer
>>>>>> produces a SEGV. Let me know if you run into any more issues.
>>>>>> 
>>>>>> 
>>>>>> -Nathan
>>>>>> 
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/devel/2014/09/15915.php
>>>>>> 
>>>>>> Index: ompi/mca/topo/base/topo_base_cart_create.c
>>>>>> ===================================================================
>>>>>> --- ompi/mca/topo/base/topo_base_cart_create.c  (revision 32807)
>>>>>> +++ ompi/mca/topo/base/topo_base_cart_create.c  (working copy)
>>>>>> @@ -163,10 +163,18 @@
>>>>>>       return MPI_ERR_INTERN;
>>>>>>   }
>>>>>> 
>>>>>> +    assert(NULL == new_comm->c_topo);
>>>>>> +    assert(!(new_comm->c_flags & OMPI_COMM_CART));
>>>>>> +    new_comm->c_topo           = topo;
>>>>>> +    new_comm->c_topo->mtc.cart = cart;
>>>>>> +    new_comm->c_topo->reorder  = reorder;
>>>>>> +    new_comm->c_flags         |= OMPI_COMM_CART;
>>>>>>   ret = ompi_comm_enable(old_comm, new_comm,
>>>>>>                          new_rank, num_procs, topo_procs);
>>>>>>   if (OMPI_SUCCESS != ret) {
>>>>>>       /* something wrong happened during setting the communicator */
>>>>>> +        new_comm->c_topo = NULL;
>>>>>> +        new_comm->c_flags &= ~OMPI_COMM_CART;
>>>>>>       ompi_comm_free (&new_comm);
>>>>>>       free(topo_procs);
>>>>>>       if(NULL != cart->periods) free(cart->periods);
>>>>>> @@ -176,10 +184,6 @@
>>>>>>       return ret;
>>>>>>   }
>>>>>> 
>>>>>> -    new_comm->c_topo           = topo;
>>>>>> -    new_comm->c_topo->mtc.cart = cart;
>>>>>> -    new_comm->c_topo->reorder  = reorder;
>>>>>> -    new_comm->c_flags         |= OMPI_COMM_CART;
>>>>>>   *comm_topo = new_comm;
>>>>>> 
>>>>>>   if( MPI_UNDEFINED == new_rank ) {
>>>>>> Index: ompi/mca/coll/basic/coll_basic_module.c
>>>>>> ===================================================================
>>>>>> --- ompi/mca/coll/basic/coll_basic_module.c     (revision 32807)
>>>>>> +++ ompi/mca/coll/basic/coll_basic_module.c     (working copy)
>>>>>> @@ -13,6 +13,8 @@
>>>>>> * Copyright (c) 2012      Sandia National Laboratories. All rights 
>>>>>> reserved.
>>>>>> * Copyright (c) 2013      Los Alamos National Security, LLC. All rights
>>>>>> *                         reserved.
>>>>>> + * Copyright (c) 2014      Research Organization for Information Science
>>>>>> + *                         and Technology (RIST). All rights reserved.
>>>>>> * $COPYRIGHT$
>>>>>> *
>>>>>> * Additional copyrights may follow
>>>>>> @@ -28,6 +30,7 @@
>>>>>> #include "mpi.h"
>>>>>> #include "ompi/mca/coll/coll.h"
>>>>>> #include "ompi/mca/coll/base/base.h"
>>>>>> +#include "ompi/mca/topo/topo.h"
>>>>>> #include "coll_basic.h"
>>>>>> 
>>>>>> 
>>>>>> @@ -70,6 +73,15 @@
>>>>>>   } else {
>>>>>>       size = ompi_comm_size(comm);
>>>>>>   }
>>>>>> +    if (comm->c_flags & OMPI_COMM_CART) {
>>>>>> +        int cart_size;
>>>>>> +        assert (NULL != comm->c_topo);
>>>>>> +        comm->c_topo->topo.cart.cartdim_get(comm, &cart_size);
>>>>>> +        cart_size *= 2;
>>>>>> +        if (cart_size > size) {
>>>>>> +            size = cart_size;
>>>>>> +        }
>>>>>> +    }
>>>>>>   basic_module->mccb_num_reqs = size * 2;
>>>>>>   basic_module->mccb_reqs = (ompi_request_t**)
>>>>>>       malloc(sizeof(ompi_request_t *) * basic_module->mccb_num_reqs);
>>>>>> 
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/devel/2014/09/15929.php
>>>>>> 
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/devel/2014/09/15930.php
>>>>>> Index: ompi/mca/topo/base/topo_base_cart_create.c
>>>>>> ===================================================================
>>>>>> --- ompi/mca/topo/base/topo_base_cart_create.c    (revision 32815)
>>>>>> +++ ompi/mca/topo/base/topo_base_cart_create.c    (working copy)
>>>>>> @@ -163,10 +163,18 @@
>>>>>>      return MPI_ERR_INTERN;
>>>>>>  }
>>>>>> 
>>>>>> +    assert(NULL == new_comm->c_topo);
>>>>>> +    assert(!(new_comm->c_flags & OMPI_COMM_CART));
>>>>>> +    new_comm->c_topo           = topo;
>>>>>> +    new_comm->c_topo->mtc.cart = cart;
>>>>>> +    new_comm->c_topo->reorder  = reorder;
>>>>>> +    new_comm->c_flags         |= OMPI_COMM_CART;
>>>>>>  ret = ompi_comm_enable(old_comm, new_comm,
>>>>>>                         new_rank, num_procs, topo_procs);
>>>>>>  if (OMPI_SUCCESS != ret) {
>>>>>>      /* something wrong happened during setting the communicator */
>>>>>> +        new_comm->c_topo = NULL;
>>>>>> +        new_comm->c_flags &= ~OMPI_COMM_CART;
>>>>>>      ompi_comm_free (&new_comm);
>>>>>>      free(topo_procs);
>>>>>>      if(NULL != cart->periods) free(cart->periods);
>>>>>> @@ -176,10 +184,6 @@
>>>>>>      return ret;
>>>>>>  }
>>>>>> 
>>>>>> -    new_comm->c_topo           = topo;
>>>>>> -    new_comm->c_topo->mtc.cart = cart;
>>>>>> -    new_comm->c_topo->reorder  = reorder;
>>>>>> -    new_comm->c_flags         |= OMPI_COMM_CART;
>>>>>>  *comm_topo = new_comm;
>>>>>> 
>>>>>>  if( MPI_UNDEFINED == new_rank ) {
>>>>>> Index: ompi/mca/coll/basic/coll_basic_module.c
>>>>>> ===================================================================
>>>>>> --- ompi/mca/coll/basic/coll_basic_module.c    (revision 32815)
>>>>>> +++ ompi/mca/coll/basic/coll_basic_module.c    (working copy)
>>>>>> @@ -13,6 +13,8 @@
>>>>>> * Copyright (c) 2012      Sandia National Laboratories. All rights 
>>>>>> reserved.
>>>>>> * Copyright (c) 2013      Los Alamos National Security, LLC. All rights
>>>>>> *                         reserved.
>>>>>> + * Copyright (c) 2014      Research Organization for Information Science
>>>>>> + *                         and Technology (RIST). All rights reserved.
>>>>>> * $COPYRIGHT$
>>>>>> * 
>>>>>> * Additional copyrights may follow
>>>>>> @@ -28,6 +30,8 @@
>>>>>> #include "mpi.h"
>>>>>> #include "ompi/mca/coll/coll.h"
>>>>>> #include "ompi/mca/coll/base/base.h"
>>>>>> +#include "ompi/mca/topo/topo.h"
>>>>>> +#include "ompi/mca/topo/base/base.h"
>>>>>> #include "coll_basic.h"
>>>>>> 
>>>>>> 
>>>>>> @@ -70,7 +74,36 @@
>>>>>>  } else {
>>>>>>      size = ompi_comm_size(comm);
>>>>>>  }
>>>>>> -    basic_module->mccb_num_reqs = size * 2;
>>>>>> +    size *= 2;
>>>>>> +    if (OMPI_COMM_IS_CART(comm)) {
>>>>>> +        int cart_size;
>>>>>> +        mca_topo_base_comm_cart_2_1_0_t *cart;
>>>>>> +        assert (NULL != comm->c_topo);
>>>>>> +        cart = comm->c_topo->mtc.cart;
>>>>>> +        cart_size = cart->ndims * 4;
>>>>>> +        if (cart_size > size) {
>>>>>> +            size = cart_size;
>>>>>> +        }
>>>>>> +    } else if (OMPI_COMM_IS_GRAPH(comm)) {
>>>>>> +        int rank, degree;
>>>>>> +        assert (NULL != comm->c_topo);
>>>>>> +        rank = ompi_comm_rank (comm);
>>>>>> +        mca_topo_base_graph_neighbors_count (comm, rank, &degree);
>>>>>> +        degree *= 2;
>>>>>> +        if (degree > size) {
>>>>>> +            size = degree;
>>>>>> +        }
>>>>>> +    } else if (OMPI_COMM_IS_DIST_GRAPH(comm)) {
>>>>>> +        int dist_graph_size;
>>>>>> +        mca_topo_base_comm_dist_graph_2_1_0_t *dist_graph;
>>>>>> +        assert (NULL != comm->c_topo);
>>>>>> +        dist_graph = comm->c_topo->mtc.dist_graph;
>>>>>> +        dist_graph_size = dist_graph->indegree + dist_graph->outdegree;
>>>>>> +        if (dist_graph_size > size) {
>>>>>> +            size = dist_graph_size;
>>>>>> +        }
>>>>>> +    }
>>>>>> +    basic_module->mccb_num_reqs = size;
>>>>>>  basic_module->mccb_reqs = (ompi_request_t**) 
>>>>>>      malloc(sizeof(ompi_request_t *) * basic_module->mccb_num_reqs);
>>>>>> 
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/devel/2014/09/15941.php
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/devel/2014/09/15945.php
>>> 
>>> <mmcb.v3.patch>_______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/10/15957.php
>> 
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to: 
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/10/15959.php
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/10/15960.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to