The bfo was my creation many years ago.  Can we keep it around for a little 
longer?  If we blow it away, then we should probably clean up all the code I 
also have in the openib BTL for supporting failover.  There is also some 
configure code that would have to go as well.

Rolf

>-----Original Message-----
>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan
>Hjelm
>Sent: Wednesday, September 16, 2015 1:43 PM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] The issue with OMPI_FREE_LIST_GET_MT()
>
>* PGP Signed by an unknown key
>
>
>Not sure. I give a +1 for blowing them away. We can bring them back later if
>needed.
>
>-Nathan
>
>On Wed, Sep 16, 2015 at 01:19:24PM -0400, George Bosilca wrote:
>>    As they don't even compile why are we keeping them around?
>>      George.
>>    On Wed, Sep 16, 2015 at 12:05 PM, Nathan Hjelm <hje...@lanl.gov>
>wrote:
>>
>>      iboffload and bfo are opal ignored by default. Neither exists in the
>>      release branch.
>>
>>      -Nathan
>>      On Wed, Sep 16, 2015 at 12:02:29PM -0400, George Bosilca wrote:
>>      >    While looking into a possible fix for this problem we should also
>>      cleanup
>>      >    in the trunk the leftover from the OMPI_FREE_LIST.
>>      >    $find . -name "*.[ch]" -exec grep -Hn OMPI_FREE_LIST_GET_MT {} +
>>      >    ./opal/mca/btl/usnic/btl_usnic_compat.h:161:
>>      >     OMPI_FREE_LIST_GET_MT(list, (item))
>>      >    ./ompi/mca/pml/bfo/pml_bfo_recvreq.h:89:
>>      >    OMPI_FREE_LIST_GET_MT(&mca_pml_base_recv_requests, item);
>>      \
>>      >    ./ompi/mca/bcol/iboffload/bcol_iboffload_task.h:149:
>>      >     OMPI_FREE_LIST_GET_MT(&cm->tasks_free, item);
>>      >    ./ompi/mca/bcol/iboffload/bcol_iboffload_task.h:206:
>>      >     OMPI_FREE_LIST_GET_MT(task_list, item);
>>      >    ./ompi/mca/bcol/iboffload/bcol_iboffload_frag.c:107:
>>      >     OMPI_FREE_LIST_GET_MT(&device->frags_free[qp_index], item);
>>      >    ./ompi/mca/bcol/iboffload/bcol_iboffload_frag.c:146:
>>      >     OMPI_FREE_LIST_GET_MT(&device->frags_free[qp_index], item);
>>      >    ./ompi/mca/bcol/iboffload/bcol_iboffload_frag.c:208:
>>      >     OMPI_FREE_LIST_GET_MT(&iboffload->device-
>>frags_free[qp_index],
>>      item);
>>      >    ./ompi/mca/bcol/iboffload/bcol_iboffload_qp_info.c:156:
>>      >     OMPI_FREE_LIST_GET_MT(&device->frags_free[qp_index], item);
>>      >    ./ompi/mca/bcol/iboffload/bcol_iboffload_collfrag.h:130:
>>      >     OMPI_FREE_LIST_GET_MT(&cm->collfrags_free, item);
>>      >    ./ompi/mca/bcol/iboffload/bcol_iboffload_frag.h:115:
>>      >     OMPI_FREE_LIST_GET_MT(&cm->ml_frags_free, item);
>>      >    I wonder how these are even compiling ...
>>      >      George.
>>      >    On Wed, Sep 16, 2015 at 11:59 AM, George Bosilca
>>      <bosi...@icl.utk.edu>
>>      >    wrote:
>>      >
>>      >      Alexey,
>>      >      This is not necessarily the fix for all cases. Most of the
>>      internal uses
>>      >      of the free_list can easily accommodate to the fact that no more
>>      >      elements are available. Based on your description of the problem
>>      I would
>>      >      assume you encounter this problem once the
>>      >      MCA_PML_OB1_RECV_REQUEST_ALLOC is called. In this particular
>case
>>      the
>>      >      problem is that fact that we call OMPI_FREE_LIST_GET_MT and that
>>      the
>>      >      upper level is unable to correctly deal with the case where the
>>      returned
>>      >      item is NULL. In this particular case the real fix is to use the
>>      >      blocking version of the free_list accessor (similar to the case
>>      for
>>      >      send) OMPI_FREE_LIST_WAIT_MT.
>>      >      It is also possible that I misunderstood your problem. IF the
>>      solution
>>      >      above doesn't work can you describe exactly where the NULL return
>>      of the
>>      >      OMPI_FREE_LIST_GET_MT is creating an issue?
>>      >      George.
>>      >      On Wed, Sep 16, 2015 at 9:03 AM, Aleksej Ryzhih
>>      >      <avryzh...@compcenter.org> wrote:
>>      >
>>      >        Hi all,
>>      >
>>      >        We experimented with MPI+OpenMP hybrid application
>>      >        (MPI_THREAD_MULTIPLE support level)  where several threads
>>      submits a
>>      >        lot of MPI_Irecv() requests simultaneously and encountered an
>>      >        intermittent bug OMPI_ERR_TEMP_OUT_OF_RESOURCE after
>>      >        MCA_PML_OB1_RECV_REQUEST_ALLOC()  because
>>      OMPI_FREE_LIST_GET_MT()
>>      >         returned NULL.  Investigating this bug we found that sometimes
>>      the
>>      >        thread calling ompi_free_list_grow()  don't have any free items
>>      in
>>      >        LIFO list at exit because other threads  retrieved  all new
>>      items at
>>      >        opal_atomic_lifo_pop()
>>      >
>>      >        So we suggest to change OMPI_FREE_LIST_GET_MT() as below:
>>      >
>>      >
>>      >
>>      >        #define OMPI_FREE_LIST_GET_MT(fl,
>>      >        item)
>>            \
>>      >
>>      >
>>      >        {
>>      >                                  \
>>      >
>>      >                item = (ompi_free_list_item_t*)
>>      >        opal_atomic_lifo_pop(&((fl)->super));             \
>>      >
>>      >                if( OPAL_UNLIKELY(NULL == item) )
>>      >        {                                               \
>>      >
>>      >                    if(opal_using_threads())
>>      >        {                                                    \
>>      >
>>      >                        int rc;
>>      >                                  \
>>      >
>>      >
>>      >        opal_mutex_lock(&((fl)->fl_lock));
>>      >        \
>>      >
>>      >
>>      >        do
>>      >        \
>>      >
>>      >                        {
>>      >                                                      \
>>      >
>>      >                            rc = ompi_free_list_grow((fl),
>>      >        (fl)->fl_num_per_alloc);               \
>>      >
>>      >                            if( OPAL_UNLIKELY(rc != OMPI_SUCCESS))
>>      >        break;                         \
>>      >
>>      >
>>      >
>>        \
>>      >
>>      >                            item = (ompi_free_list_item_t*)
>>      >        opal_atomic_lifo_pop(&((fl)->super)); \
>>      >
>>      >
>>      >        \
>>      >
>>      >                        } while
>>      >        (!item);
>>        \
>>      >
>>      >
>>      >        opal_mutex_unlock(&((fl)->fl_lock));
>>      >        \
>>      >
>>      >                    } else
>>      >        {
>>      >                      \
>>      >
>>      >                        ompi_free_list_grow((fl),
>>      >        (fl)->fl_num_per_alloc);                        \
>>      >
>>      >                        item = (ompi_free_list_item_t*)
>>      >        opal_atomic_lifo_pop(&((fl)->super));     \
>>      >
>>      >                    } /* opal_using_threads() */
>>      >                                          \
>>      >
>>      >                } /* NULL == item
>>      >        */
>>      \
>>      >
>>      >            }
>>      >
>>      >
>>      >
>>      >
>>      >
>>      >        Another workaround is to increase the value of
>>      pml_ob1_free_list_inc
>>      >        parameter.
>>      >
>>      >
>>      >
>>      >        Regards,
>>      >
>>      >        Alexey
>>      >
>>      >
>>      >
>>      >        _______________________________________________
>>      >        devel mailing list
>>      >        de...@open-mpi.org
>>      >        Subscription:
>>      http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>      >        Link to this post:
>>      >        http://www.open-
>mpi.org/community/lists/devel/2015/09/18039.php
>>
>>      > _______________________________________________
>>      > devel mailing list
>>      > de...@open-mpi.org
>>      > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>      > Link to this post:
>>      http://www.open-mpi.org/community/lists/devel/2015/09/18046.php
>>
>>      _______________________________________________
>>      devel mailing list
>>      de...@open-mpi.org
>>      Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>      Link to this post:
>>      http://www.open-mpi.org/community/lists/devel/2015/09/18048.php
>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/09/18050.php
>
>
>* Unknown Key
>* 0x9AC22B15
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

Reply via email to