Thanks, George, I got it now. I had missed the fact that opal_atomic_add() is a macro that applies sizeof() to call the correct atomic function. So, I believe that Nathan's fixes for the 64-bit atomics support detection should be the only thing needed.
-Paul On Tue, May 10, 2016 at 1:50 PM, George Bosilca <bosi...@icl.utk.edu> wrote: > The opal_atomic_add function contains all the possible calls to the > underlying atomic functions, based on the size of the data. Thus, if we > detect that 64 bits atomics are available, and if the compiler doesn't > automatically remove the unnecessary code (from the switch case with a > const), then we will [wrongfully] generate the stub for calling the 64 bits > atomic operation. > > George. > > > > On Tue, May 10, 2016 at 3:48 PM, Paul Hargrove <phhargr...@lbl.gov> wrote: > >> George, >> >> I believe that the 64-bit atomics support detection issue that you are >> suspecting is already covered by Nathan's PR1659 on master, and will be >> PRed/CMRed for v2.x after that is merged. >> >> Regardless of that, however, since the field is declared as "volatile >> int32_t" the opal_list code does need to be fixed to use 32-bit operations >> unconditionally, right? >> >> -Paul >> >> On Tue, May 10, 2016 at 12:36 PM, George Bosilca <bosi...@icl.utk.edu> >> wrote: >> >>> Paul, >>> >>> I think the ref_count should always be manipulated with atomic >>> operations, otherwise we can't use them for internal, thread-safe, >>> purposes. That being said the issue at hand seems a little different. The >>> difference in the generated code between the opal_atomic_add and the >>> OPAL_THREAD_ADD32, is that the macro is explicitly calling >>> opal_atomic_add32, while the generic atomic_add has a switch inside (to >>> select between atomics operations on different type). For the error you >>> mention to happen our configure script must have detected that there is >>> support for 8bytes atomic operations, thus setting OPAL_HAVE_ATOMIC_ADD_64 >>> to 1. >>> >>> Can you take a look at the 64 bits atomic detection in the config.log >>> and post here the corresponding output ? >>> >>> Thanks, >>> George. >>> >>> >>> >>> On Tue, May 10, 2016 at 1:38 PM, Paul Hargrove <phhargr...@lbl.gov> >>> wrote: >>> >>>> I am currently working with the v2.x branch, rather than tarballs. >>>> >>>> While attempting to build on AIX (which is ILP32 by default), I >>>> encountered an unexpected undefined reference to __sync_add_and_fetch_8() >>>> from opal/class/opal_list.h. >>>> >>>> I found that when debugging is enabled (as in almost every build I try) >>>> there is the following code: >>>> #if OPAL_ENABLE_DEBUG >>>> /* Spot check: ensure this item is only on the list that we >>>> just insertted it into */ >>>> >>>> (void)opal_atomic_add( &(item->opal_list_item_refcount), 1 ); >>>> assert(1 == item->opal_list_item_refcount); >>>> item->opal_list_item_belong_to = list; >>>> #endif >>>> >>>> I am not sure why (and it may be an AIX-specific issue), but that >>>> "opal_atomic_add()" is attempting a 64-bit add. >>>> That is a problem, given that 'opal_list_item_refcount' is 32-bits! >>>> >>>> Noting that all other accesses to this field are OPAL_THREAD_ADD32(), I >>>> suggest the following (with a bonus spell-check at no additional charge): >>>> >>>> --- opal/class/opal_list.c~ 2016-05-10 10:20:19.000000000 -0700 >>>> +++ opal/class/opal_list.c 2016-05-10 10:29:14.000000000 -0700 >>>> @@ -142,9 +142,9 @@ >>>> >>>> #if OPAL_ENABLE_DEBUG >>>> /* Spot check: ensure this item is only on the list that we >>>> - just insertted it into */ >>>> + just inserted it into */ >>>> >>>> - (void)opal_atomic_add( &(item->opal_list_item_refcount), 1 ); >>>> + (void)OPAL_THREAD_ADD32( &(item->opal_list_item_refcount), 1 >>>> ); >>>> assert(1 == item->opal_list_item_refcount); >>>> item->opal_list_item_belong_to = list; >>>> #endif >>>> >>>> >>>> Source inspection shows the same mixing or opal_atomic_add() vs >>>> OPAL_THREAD_ADD32() in master. >>>> >>>> -Paul >>>> >>>> >>>> -- >>>> Paul H. Hargrove phhargr...@lbl.gov >>>> Computer Languages & Systems Software (CLaSS) Group >>>> Computer Science Department Tel: +1-510-495-2352 >>>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/devel/2016/05/18952.php >>>> >>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2016/05/18953.php >>> >> >> >> >> -- >> Paul H. Hargrove phhargr...@lbl.gov >> Computer Languages & Systems Software (CLaSS) Group >> Computer Science Department Tel: +1-510-495-2352 >> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2016/05/18954.php >> > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2016/05/18955.php > -- Paul H. Hargrove phhargr...@lbl.gov Computer Languages & Systems Software (CLaSS) Group Computer Science Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900