...or in 1.5.5. How soon will you be able to tell if it fixes some hangs?
On Mar 1, 2012, at 10:56 AM, Nathan Hjelm wrote: > Found a pretty nasty frag leak (and a minor one) in ob1 (see commit below). > If this fix addresses some hangs we are seeing on infiniband LANL might want > a 1.4.6 rolled (or a faster rollout for 1.6.0). > > -Nathan > > ---------- Forwarded message ---------- > Date: Thu, 1 Mar 2012 08:53:39 -0700 > From: hje...@osl.iu.edu > Reply-To: de...@open-mpi.org > To: s...@open-mpi.org > Subject: [OMPI svn] svn:open-mpi r26077 > > Author: hjelmn > Date: 2012-03-01 10:53:39 EST (Thu, 01 Mar 2012) > New Revision: 26077 > URL: https://svn.open-mpi.org/trac/ompi/changeset/26077 > > Log: > ob1: fix two fragment leaks > - MAJOR! get src descriptor leaks if mca_bml_base_send fails > - minor. descriptor leaked in mca_pml_send_request_start_copy if the btl > returns OMPI_ERR_RESOURCE_BUSY. > Text files modified: > trunk/ompi/mca/pml/ob1/pml_ob1_sendreq.c | 27 ++++++++++++++++----------- > 1 files changed, 16 insertions(+), 11 deletions(-) > > Modified: trunk/ompi/mca/pml/ob1/pml_ob1_sendreq.c > ============================================================================== > --- trunk/ompi/mca/pml/ob1/pml_ob1_sendreq.c (original) > +++ trunk/ompi/mca/pml/ob1/pml_ob1_sendreq.c 2012-03-01 10:53:39 EST (Thu, > 01 Mar 2012) > @@ -1,3 +1,4 @@ > +/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ > /* > * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana > * University Research and Technology > @@ -12,6 +13,8 @@ > * Copyright (c) 2008 UT-Battelle, LLC. All rights reserved. > * Copyright (c) 2010 Oracle and/or its affiliates. All rights reserved. > * Copyright (c) 2012 NVIDIA Corporation. All rights reserved. > + * Copyright (c) 2012 Los Alamos National Security, LLC. All rights > + * reserved. > * $COPYRIGHT$ > * > * Additional copyrights may follow > @@ -546,15 +549,14 @@ > } > return OMPI_SUCCESS; > } > - switch(OPAL_SOS_GET_ERROR_CODE(rc)) { > - case OMPI_ERR_RESOURCE_BUSY: > - /* No more resources. Allow the upper level to queue the send */ > - rc = OMPI_ERR_OUT_OF_RESOURCE; > - break; > - default: > - mca_bml_base_free(bml_btl, des); > - break; > + > + if (OMPI_ERR_RESOURCE_BUSY == OPAL_SOS_GET_ERROR_CODE(rc)) { > + /* No more resources. Allow the upper level to queue the send */ > + rc = OMPI_ERR_OUT_OF_RESOURCE; > } > + > + mca_bml_base_free (bml_btl, des); > + > return rc; > } > > @@ -631,7 +633,7 @@ > * operation is achieved. > */ > > - mca_btl_base_descriptor_t* des; > + mca_btl_base_descriptor_t *des, *src = NULL; > mca_btl_base_segment_t* segment; > mca_pml_ob1_hdr_t* hdr; > bool need_local_cb = false; > @@ -640,7 +642,6 @@ > bml_btl = sendreq->req_rdma[0].bml_btl; > if((sendreq->req_rdma_cnt == 1) && (bml_btl->btl_flags & > (MCA_BTL_FLAGS_GET | MCA_BTL_FLAGS_CUDA_GET))) { > mca_mpool_base_registration_t* reg = sendreq->req_rdma[0].btl_reg; > - mca_btl_base_descriptor_t* src; > size_t i; > size_t old_position = > sendreq->req_send.req_base.req_convertor.bConverted; > > @@ -781,6 +782,10 @@ > return OMPI_SUCCESS; > } > mca_bml_base_free(bml_btl, des); > + if (NULL != src) { > + mca_bml_base_free (bml_btl, src); > + } > + > return rc; > } > > @@ -1144,7 +1149,7 @@ > 0, > &frag->rdma_length, > MCA_BTL_DES_FLAGS_BTL_OWNERSHIP | > - MCA_BTL_DES_FLAGS_PUT, > + MCA_BTL_DES_FLAGS_PUT, > &des ); > > if( OPAL_UNLIKELY(NULL == des) ) { > _______________________________________________ > svn mailing list > s...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/svn > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/