On Wed, Jan 22, 2014 at 2:22 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:

> My ia64 asm is a bit rusty, but I'll give a quick look if/when I can.


I had a look (in v1.7) and this is what I see:

$cat -n IA64.asm | grep -A14 opal_atomic_cmpset_acq_64:
    70  opal_atomic_cmpset_acq_64:
    71          .prologue
    72          .body
    73          mov ar.ccv=r33;;
    74          cmpxchg8.acq r32=[r32],r34,ar.ccv
    75          ;;
    76          sxt4 r32 = r32
    77          ;;
    78          cmp.eq p6, p7 = r33, r32
    79          ;;
    80          (p6) addl r8 = 1, r0
    81          (p7) mov r8 = r0
    82          br.ret.sptk.many b0
    83          ;;
    84          .endp opal_atomic_cmpset_acq_64#

The (approximate and non-atomic) C equivalent is:

// r32 = address
// r33 = oldvalue
// r34 = newvalue
int opal_atomic_cmpset_acq_64(int64_t r32, int64_t r33, int64 r34) {
   int64_t ccv = r33; // L73
   if (*(int64_t *)r32 == ccv) *(int64_t *)r32 = r34; // L74

   r32 = (int64_t)(int32_t)r32; // L76 = sign-extend 32->64

   bool p6, p7;
   p7 = !(p6 = (r33 == r32)); // L78

   const int r0 = 0;
   int r8;
   if (p6) r8 = 1 + r0; // L80
   if (p7) r8 = r0; // L81
   return r8; // L82
}

Which is fine except that line 76 is totally wrong!!
The "sxt4" instruction is "sign-extend from 4 bytes to 8 bytes".
Thus the upper 32-bits of the value read from memory are lost!
Unless the upper 33 bits off r33 (oldvalue) are all 0s or all 1s, the
comparison on line 78 MUST fail.
This explains the hang, as the lifo push will loop indefinitely waiting for
the success of this cmpset.

Note the same erroneous instruction is also present in the _rel variant (at
line 94).
The trunk has the same issue.
This code has not changed at all since IA64.asm was added way back in r4471.

I won't have access to the IA64 platform again until tomorrow AM.
So, testing my hypothesis will need to wait.

BTW:
IFF I am right about the source of this problem, then it would be
beneficial to have (and I may contribute) a stronger test (for "make
check") that would detect this sort of bug in the atomics (specifically
look for both false-positive and false-negative return value from 64-bit
cmpset operations with values satisfying a range of "corner cases").  I
think I have single-bit and double-bit "marching tests" for cmpset in my
own arsenal of tests for GASNet's atomics.  If I don't have time to
contribute a complete test, I can at least contribute that logic for
somebody else to port to the OPAL atomics.

-Paul

P.S.:
The cmpxchgN for N in 1,2,4 are documented as ZERO-extending their loads to
64-bits.
So, there is a slim chance that the sxt4 actually was intended for the
32-bit cmpset code.
However, since the comparison used there is a "cmp4.eq" the "sxt4" would
still not be needed.


-- 
Paul H. Hargrove                          phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department     Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

Reply via email to