Re: [OMPI devel] [v2.x] opal_list debug ref-count bug?

Paul Hargrove Tue, 10 May 2016 17:11:25 -0400 (EDT)

Thanks, George, I got it now.
I had missed the fact that opal_atomic_add() is a macro that applies
sizeof() to call the correct atomic function.
So, I believe that Nathan's fixes for the 64-bit atomics support detection
should be the only thing needed.


-Paul

On Tue, May 10, 2016 at 1:50 PM, George Bosilca <bosi...@icl.utk.edu> wrote:

> The opal_atomic_add function contains all the possible calls to the
> underlying atomic functions, based on the size of the data. Thus, if we
> detect that 64 bits atomics are available, and if the compiler doesn't
> automatically remove the unnecessary code (from the switch case with a
> const), then we will [wrongfully] generate the stub for calling the 64 bits
> atomic operation.
>
>   George.
>
>
>
> On Tue, May 10, 2016 at 3:48 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>
>> George,
>>
>> I believe that the 64-bit atomics support detection issue that you are
>> suspecting is already covered by Nathan's PR1659 on master, and will be
>> PRed/CMRed for v2.x after that is merged.
>>
>> Regardless of that, however, since the field is declared as "volatile
>> int32_t" the opal_list code does need to be fixed to use 32-bit operations
>> unconditionally, right?
>>
>> -Paul
>>
>> On Tue, May 10, 2016 at 12:36 PM, George Bosilca <bosi...@icl.utk.edu>
>> wrote:
>>
>>> Paul,
>>>
>>> I think the ref_count should always be manipulated with atomic
>>> operations, otherwise we can't use them for internal, thread-safe,
>>> purposes. That being said the issue at hand seems a little different. The
>>> difference in the generated code between the opal_atomic_add and the
>>> OPAL_THREAD_ADD32, is that the macro is explicitly calling
>>> opal_atomic_add32, while the generic atomic_add has a switch inside (to
>>> select between atomics operations on different type). For the error you
>>> mention to happen our configure script must have detected that there is
>>> support for 8bytes atomic operations, thus setting OPAL_HAVE_ATOMIC_ADD_64
>>> to 1.
>>>
>>> Can you take a look at the 64 bits atomic detection in the config.log
>>> and post here the corresponding output ?
>>>
>>> Thanks,
>>>   George.
>>>
>>>
>>>
>>> On Tue, May 10, 2016 at 1:38 PM, Paul Hargrove <phhargr...@lbl.gov>
>>> wrote:
>>>
>>>> I am currently working with the v2.x branch, rather than tarballs.
>>>>
>>>> While attempting to build on AIX (which is ILP32 by default), I
>>>> encountered an unexpected undefined reference to __sync_add_and_fetch_8()
>>>> from opal/class/opal_list.h.
>>>>
>>>> I found that when debugging is enabled (as in almost every build I try)
>>>> there is the following code:
>>>> #if OPAL_ENABLE_DEBUG
>>>>         /* Spot check: ensure this item is only on the list that we
>>>>            just insertted it into */
>>>>
>>>>         (void)opal_atomic_add( &(item->opal_list_item_refcount), 1 );
>>>>         assert(1 == item->opal_list_item_refcount);
>>>>         item->opal_list_item_belong_to = list;
>>>> #endif
>>>>
>>>> I am not sure why (and it may be an AIX-specific issue), but that
>>>> "opal_atomic_add()" is attempting a 64-bit add.
>>>> That is a problem, given that 'opal_list_item_refcount' is 32-bits!
>>>>
>>>> Noting that all other accesses to this field are OPAL_THREAD_ADD32(), I
>>>> suggest the following (with a bonus spell-check at no additional charge):
>>>>
>>>> --- opal/class/opal_list.c~     2016-05-10 10:20:19.000000000 -0700
>>>> +++ opal/class/opal_list.c      2016-05-10 10:29:14.000000000 -0700
>>>> @@ -142,9 +142,9 @@
>>>>
>>>>   #if OPAL_ENABLE_DEBUG
>>>>           /* Spot check: ensure this item is only on the list that we
>>>> -            just insertted it into */
>>>> +            just inserted it into */
>>>>
>>>> -         (void)opal_atomic_add( &(item->opal_list_item_refcount), 1 );
>>>> +         (void)OPAL_THREAD_ADD32( &(item->opal_list_item_refcount), 1
>>>> );
>>>>           assert(1 == item->opal_list_item_refcount);
>>>>           item->opal_list_item_belong_to = list;
>>>>   #endif
>>>>
>>>>
>>>> Source inspection shows the same mixing or opal_atomic_add() vs
>>>> OPAL_THREAD_ADD32() in master.
>>>>
>>>> -Paul
>>>>
>>>>
>>>> --
>>>> Paul H. Hargrove                          phhargr...@lbl.gov
>>>> Computer Languages & Systems Software (CLaSS) Group
>>>> Computer Science Department               Tel: +1-510-495-2352
>>>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/devel/2016/05/18952.php
>>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2016/05/18953.php
>>>
>>
>>
>>
>> --
>> Paul H. Hargrove                          phhargr...@lbl.gov
>> Computer Languages & Systems Software (CLaSS) Group
>> Computer Science Department               Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2016/05/18954.php
>>
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2016/05/18955.php
>



-- 
Paul H. Hargrove                          phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department               Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

Re: [OMPI devel] [v2.x] opal_list debug ref-count bug?

Reply via email to