Not sure. I give a +1 for blowing them away. We can bring them back
later if needed.

-Nathan

On Wed, Sep 16, 2015 at 01:19:24PM -0400, George Bosilca wrote:
>    As they don't even compile why are we keeping them around?
>      George.
>    On Wed, Sep 16, 2015 at 12:05 PM, Nathan Hjelm <hje...@lanl.gov> wrote:
> 
>      iboffload and bfo are opal ignored by default. Neither exists in the
>      release branch.
> 
>      -Nathan
>      On Wed, Sep 16, 2015 at 12:02:29PM -0400, George Bosilca wrote:
>      >    While looking into a possible fix for this problem we should also
>      cleanup
>      >    in the trunk the leftover from the OMPI_FREE_LIST.
>      >    $find . -name "*.[ch]" -exec grep -Hn OMPI_FREE_LIST_GET_MT {} +
>      >    ./opal/mca/btl/usnic/btl_usnic_compat.h:161:
>      >     OMPI_FREE_LIST_GET_MT(list, (item))
>      >    ./ompi/mca/pml/bfo/pml_bfo_recvreq.h:89:
>      >    OMPI_FREE_LIST_GET_MT(&mca_pml_base_recv_requests, item);         
>      \
>      >    ./ompi/mca/bcol/iboffload/bcol_iboffload_task.h:149:
>      >     OMPI_FREE_LIST_GET_MT(&cm->tasks_free, item);
>      >    ./ompi/mca/bcol/iboffload/bcol_iboffload_task.h:206:
>      >     OMPI_FREE_LIST_GET_MT(task_list, item);
>      >    ./ompi/mca/bcol/iboffload/bcol_iboffload_frag.c:107:
>      >     OMPI_FREE_LIST_GET_MT(&device->frags_free[qp_index], item);
>      >    ./ompi/mca/bcol/iboffload/bcol_iboffload_frag.c:146:
>      >     OMPI_FREE_LIST_GET_MT(&device->frags_free[qp_index], item);
>      >    ./ompi/mca/bcol/iboffload/bcol_iboffload_frag.c:208:
>      >     OMPI_FREE_LIST_GET_MT(&iboffload->device->frags_free[qp_index],
>      item);
>      >    ./ompi/mca/bcol/iboffload/bcol_iboffload_qp_info.c:156:
>      >     OMPI_FREE_LIST_GET_MT(&device->frags_free[qp_index], item);
>      >    ./ompi/mca/bcol/iboffload/bcol_iboffload_collfrag.h:130:
>      >     OMPI_FREE_LIST_GET_MT(&cm->collfrags_free, item);
>      >    ./ompi/mca/bcol/iboffload/bcol_iboffload_frag.h:115:
>      >     OMPI_FREE_LIST_GET_MT(&cm->ml_frags_free, item);
>      >    I wonder how these are even compiling ...
>      >      George.
>      >    On Wed, Sep 16, 2015 at 11:59 AM, George Bosilca
>      <bosi...@icl.utk.edu>
>      >    wrote:
>      >
>      >      Alexey,
>      >      This is not necessarily the fix for all cases. Most of the
>      internal uses
>      >      of the free_list can easily accommodate to the fact that no more
>      >      elements are available. Based on your description of the problem
>      I would
>      >      assume you encounter this problem once the
>      >      MCA_PML_OB1_RECV_REQUEST_ALLOC is called. In this particular case
>      the
>      >      problem is that fact that we call OMPI_FREE_LIST_GET_MT and that
>      the
>      >      upper level is unable to correctly deal with the case where the
>      returned
>      >      item is NULL. In this particular case the real fix is to use the
>      >      blocking version of the free_list accessor (similar to the case
>      for
>      >      send) OMPI_FREE_LIST_WAIT_MT.
>      >      It is also possible that I misunderstood your problem. IF the
>      solution
>      >      above doesn't work can you describe exactly where the NULL return
>      of the
>      >      OMPI_FREE_LIST_GET_MT is creating an issue?
>      >      George.
>      >      On Wed, Sep 16, 2015 at 9:03 AM, Aleksej Ryzhih
>      >      <avryzh...@compcenter.org> wrote:
>      >
>      >        Hi all,
>      >
>      >        We experimented with MPI+OpenMP hybrid application
>      >        (MPI_THREAD_MULTIPLE support level)  where several threads
>      submits a
>      >        lot of MPI_Irecv() requests simultaneously and encountered an
>      >        intermittent bug OMPI_ERR_TEMP_OUT_OF_RESOURCE after
>      >        MCA_PML_OB1_RECV_REQUEST_ALLOC()  because 
>      OMPI_FREE_LIST_GET_MT()
>      >         returned NULL.  Investigating this bug we found that sometimes
>      the
>      >        thread calling ompi_free_list_grow()  don't have any free items
>      in
>      >        LIFO list at exit because other threads  retrieved  all new
>      items at
>      >        opal_atomic_lifo_pop()
>      >
>      >        So we suggest to change OMPI_FREE_LIST_GET_MT() as below:
>      >
>      >
>      >
>      >        #define OMPI_FREE_LIST_GET_MT(fl,
>      >        item)                                                         
>            \
>      >
>      >
>      >        {
>      >                                  \
>      >
>      >                item = (ompi_free_list_item_t*)
>      >        opal_atomic_lifo_pop(&((fl)->super));             \
>      >
>      >                if( OPAL_UNLIKELY(NULL == item) )
>      >        {                                               \
>      >
>      >                    if(opal_using_threads())
>      >        {                                                    \
>      >
>      >                        int rc;
>      >                                  \
>      >
>      >
>      >        opal_mutex_lock(&((fl)->fl_lock));
>      >        \
>      >
>      >
>      >        do
>      >        \
>      >
>      >                        {
>      >                                                      \
>      >
>      >                            rc = ompi_free_list_grow((fl),
>      >        (fl)->fl_num_per_alloc);               \
>      >
>      >                            if( OPAL_UNLIKELY(rc != OMPI_SUCCESS))
>      >        break;                         \
>      >
>      >
>      >                                                                       
>        \
>      >
>      >                            item = (ompi_free_list_item_t*)
>      >        opal_atomic_lifo_pop(&((fl)->super)); \
>      >
>      >
>      >        \
>      >
>      >                        } while
>      >        (!item);                                                       
>        \
>      >
>      >
>      >        opal_mutex_unlock(&((fl)->fl_lock));
>      >        \
>      >
>      >                    } else
>      >        {
>      >                      \
>      >
>      >                        ompi_free_list_grow((fl),
>      >        (fl)->fl_num_per_alloc);                        \
>      >
>      >                        item = (ompi_free_list_item_t*)
>      >        opal_atomic_lifo_pop(&((fl)->super));     \
>      >
>      >                    } /* opal_using_threads() */
>      >                                          \
>      >
>      >                } /* NULL == item
>      >        */                                                             
>      \
>      >
>      >            }
>      >
>      >
>      >
>      >
>      >
>      >        Another workaround is to increase the value of 
>      pml_ob1_free_list_inc
>      >        parameter.
>      >
>      >
>      >
>      >        Regards,
>      >
>      >        Alexey
>      >
>      >
>      >
>      >        _______________________________________________
>      >        devel mailing list
>      >        de...@open-mpi.org
>      >        Subscription:
>      http://www.open-mpi.org/mailman/listinfo.cgi/devel
>      >        Link to this post:
>      >        http://www.open-mpi.org/community/lists/devel/2015/09/18039.php
> 
>      > _______________________________________________
>      > devel mailing list
>      > de...@open-mpi.org
>      > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>      > Link to this post:
>      http://www.open-mpi.org/community/lists/devel/2015/09/18046.php
> 
>      _______________________________________________
>      devel mailing list
>      de...@open-mpi.org
>      Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>      Link to this post:
>      http://www.open-mpi.org/community/lists/devel/2015/09/18048.php

> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/09/18050.php

Attachment: pgpdunyxEBP4t.pgp
Description: PGP signature

Reply via email to