As they don't even compile why are we keeping them around? George.
On Wed, Sep 16, 2015 at 12:05 PM, Nathan Hjelm <hje...@lanl.gov> wrote: > > iboffload and bfo are opal ignored by default. Neither exists in the > release branch. > > -Nathan > > On Wed, Sep 16, 2015 at 12:02:29PM -0400, George Bosilca wrote: > > While looking into a possible fix for this problem we should also > cleanup > > in the trunk the leftover from the OMPI_FREE_LIST. > > $find . -name "*.[ch]" -exec grep -Hn OMPI_FREE_LIST_GET_MT {} + > > ./opal/mca/btl/usnic/btl_usnic_compat.h:161: > > OMPI_FREE_LIST_GET_MT(list, (item)) > > ./ompi/mca/pml/bfo/pml_bfo_recvreq.h:89: > > OMPI_FREE_LIST_GET_MT(&mca_pml_base_recv_requests, item); \ > > ./ompi/mca/bcol/iboffload/bcol_iboffload_task.h:149: > > OMPI_FREE_LIST_GET_MT(&cm->tasks_free, item); > > ./ompi/mca/bcol/iboffload/bcol_iboffload_task.h:206: > > OMPI_FREE_LIST_GET_MT(task_list, item); > > ./ompi/mca/bcol/iboffload/bcol_iboffload_frag.c:107: > > OMPI_FREE_LIST_GET_MT(&device->frags_free[qp_index], item); > > ./ompi/mca/bcol/iboffload/bcol_iboffload_frag.c:146: > > OMPI_FREE_LIST_GET_MT(&device->frags_free[qp_index], item); > > ./ompi/mca/bcol/iboffload/bcol_iboffload_frag.c:208: > > OMPI_FREE_LIST_GET_MT(&iboffload->device->frags_free[qp_index], > item); > > ./ompi/mca/bcol/iboffload/bcol_iboffload_qp_info.c:156: > > OMPI_FREE_LIST_GET_MT(&device->frags_free[qp_index], item); > > ./ompi/mca/bcol/iboffload/bcol_iboffload_collfrag.h:130: > > OMPI_FREE_LIST_GET_MT(&cm->collfrags_free, item); > > ./ompi/mca/bcol/iboffload/bcol_iboffload_frag.h:115: > > OMPI_FREE_LIST_GET_MT(&cm->ml_frags_free, item); > > I wonder how these are even compiling ... > > George. > > On Wed, Sep 16, 2015 at 11:59 AM, George Bosilca <bosi...@icl.utk.edu > > > > wrote: > > > > Alexey, > > This is not necessarily the fix for all cases. Most of the internal > uses > > of the free_list can easily accommodate to the fact that no more > > elements are available. Based on your description of the problem I > would > > assume you encounter this problem once the > > MCA_PML_OB1_RECV_REQUEST_ALLOC is called. In this particular case > the > > problem is that fact that we call OMPI_FREE_LIST_GET_MT and that the > > upper level is unable to correctly deal with the case where the > returned > > item is NULL. In this particular case the real fix is to use the > > blocking version of the free_list accessor (similar to the case for > > send) OMPI_FREE_LIST_WAIT_MT. > > It is also possible that I misunderstood your problem. IF the > solution > > above doesn't work can you describe exactly where the NULL return > of the > > OMPI_FREE_LIST_GET_MT is creating an issue? > > George. > > On Wed, Sep 16, 2015 at 9:03 AM, Aleksej Ryzhih > > <avryzh...@compcenter.org> wrote: > > > > Hi all, > > > > We experimented with MPI+OpenMP hybrid application > > (MPI_THREAD_MULTIPLE support level) where several threads > submits a > > lot of MPI_Irecv() requests simultaneously and encountered an > > intermittent bug OMPI_ERR_TEMP_OUT_OF_RESOURCE after > > MCA_PML_OB1_RECV_REQUEST_ALLOC() because OMPI_FREE_LIST_GET_MT() > > returned NULL. Investigating this bug we found that sometimes > the > > thread calling ompi_free_list_grow() don't have any free items in > > LIFO list at exit because other threads retrieved all new items > at > > opal_atomic_lifo_pop() > > > > So we suggest to change OMPI_FREE_LIST_GET_MT() as below: > > > > > > > > #define OMPI_FREE_LIST_GET_MT(fl, > > item) > \ > > > > > > { > > \ > > > > item = (ompi_free_list_item_t*) > > opal_atomic_lifo_pop(&((fl)->super)); \ > > > > if( OPAL_UNLIKELY(NULL == item) ) > > { \ > > > > if(opal_using_threads()) > > { \ > > > > int rc; > > \ > > > > > > opal_mutex_lock(&((fl)->fl_lock)); > > \ > > > > > > do > > \ > > > > { > > \ > > > > rc = ompi_free_list_grow((fl), > > (fl)->fl_num_per_alloc); \ > > > > if( OPAL_UNLIKELY(rc != OMPI_SUCCESS)) > > break; \ > > > > > > > \ > > > > item = (ompi_free_list_item_t*) > > opal_atomic_lifo_pop(&((fl)->super)); \ > > > > > > \ > > > > } while > > (!item); > \ > > > > > > opal_mutex_unlock(&((fl)->fl_lock)); > > \ > > > > } else > > { > > \ > > > > ompi_free_list_grow((fl), > > (fl)->fl_num_per_alloc); \ > > > > item = (ompi_free_list_item_t*) > > opal_atomic_lifo_pop(&((fl)->super)); \ > > > > } /* opal_using_threads() */ > > \ > > > > } /* NULL == item > > */ \ > > > > } > > > > > > > > > > > > Another workaround is to increase the value of > pml_ob1_free_list_inc > > parameter. > > > > > > > > Regards, > > > > Alexey > > > > > > > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > > http://www.open-mpi.org/community/lists/devel/2015/09/18039.php > > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/09/18046.php > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/09/18048.php >