The formatting of the code got all messed up. Please send a diff and I
will take a look. ompi free list no longer exists in master or the next
release branch but the change may be worthwhile for the opal free list
code.

-Nathan

On Wed, Sep 16, 2015 at 04:03:44PM +0300, Алексей Рыжих wrote:
>    Hi all,
> 
>    We experimented with MPI+OpenMP hybrid application (MPI_THREAD_MULTIPLE
>    support level)  where several threads submits a lot of MPI_Irecv()
>    requests simultaneously and encountered an intermittent bug
>    OMPI_ERR_TEMP_OUT_OF_RESOURCE after MCA_PML_OB1_RECV_REQUEST_ALLOC() 
>    because  OMPI_FREE_LIST_GET_MT()  returned NULL.  Investigating this bug
>    we found that sometimes the thread calling ompi_free_list_grow()  don't
>    have any free items in LIFO list at exit because other threads  retrieved
>     all new items at opal_atomic_lifo_pop() 
> 
>    So we suggest to change OMPI_FREE_LIST_GET_MT() as below:
> 
>     
> 
>    #define OMPI_FREE_LIST_GET_MT(fl, item)                                
>                                   \
> 
>        {                                                                  
>                              \
> 
>            item = (ompi_free_list_item_t*)
>    opal_atomic_lifo_pop(&((fl)->super));             \
> 
>            if( OPAL_UNLIKELY(NULL == item) )
>    {                                               \
> 
>                if(opal_using_threads())
>    {                                                    \
> 
>                    int rc;                                        
>                              \
> 
>                   
>    opal_mutex_lock(&((fl)->fl_lock));                                       
>    \
> 
>                   
>    do                                                                       
>    \
> 
>                    {                          
>                                                  \
> 
>                        rc = ompi_free_list_grow((fl),
>    (fl)->fl_num_per_alloc);               \
> 
>                        if( OPAL_UNLIKELY(rc != OMPI_SUCCESS))
>    break;                         \
> 
>                           
>                                                                      \
> 
>                        item = (ompi_free_list_item_t*)
>    opal_atomic_lifo_pop(&((fl)->super)); \
> 
>                                                                               
>               
>    \
> 
>                    } while
>    (!item);                                                          \
> 
>                   
>    opal_mutex_unlock(&((fl)->fl_lock));                                     
>    \
> 
>                } else
>    {                                                                      \
> 
>                    ompi_free_list_grow((fl),
>    (fl)->fl_num_per_alloc);                        \
> 
>                    item = (ompi_free_list_item_t*)
>    opal_atomic_lifo_pop(&((fl)->super));     \
> 
>                } /* opal_using_threads() */               
>                                      \
> 
>            } /* NULL == item
>    */                                                              \
> 
>        }
> 
>     
> 
>     
> 
>    Another workaround is to increase the value of  pml_ob1_free_list_inc
>    parameter.
> 
>     
> 
>    Regards,
> 
>    Alexey
> 
>     

> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/09/18039.php

Attachment: pgpRE9F8AQdun.pgp
Description: PGP signature

Reply via email to