On Wed, Jan 22, 2014 at 2:22 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
> My ia64 asm is a bit rusty, but I'll give a quick look if/when I can. I had a look (in v1.7) and this is what I see: $cat -n IA64.asm | grep -A14 opal_atomic_cmpset_acq_64: 70 opal_atomic_cmpset_acq_64: 71 .prologue 72 .body 73 mov ar.ccv=r33;; 74 cmpxchg8.acq r32=[r32],r34,ar.ccv 75 ;; 76 sxt4 r32 = r32 77 ;; 78 cmp.eq p6, p7 = r33, r32 79 ;; 80 (p6) addl r8 = 1, r0 81 (p7) mov r8 = r0 82 br.ret.sptk.many b0 83 ;; 84 .endp opal_atomic_cmpset_acq_64# The (approximate and non-atomic) C equivalent is: // r32 = address // r33 = oldvalue // r34 = newvalue int opal_atomic_cmpset_acq_64(int64_t r32, int64_t r33, int64 r34) { int64_t ccv = r33; // L73 if (*(int64_t *)r32 == ccv) *(int64_t *)r32 = r34; // L74 r32 = (int64_t)(int32_t)r32; // L76 = sign-extend 32->64 bool p6, p7; p7 = !(p6 = (r33 == r32)); // L78 const int r0 = 0; int r8; if (p6) r8 = 1 + r0; // L80 if (p7) r8 = r0; // L81 return r8; // L82 } Which is fine except that line 76 is totally wrong!! The "sxt4" instruction is "sign-extend from 4 bytes to 8 bytes". Thus the upper 32-bits of the value read from memory are lost! Unless the upper 33 bits off r33 (oldvalue) are all 0s or all 1s, the comparison on line 78 MUST fail. This explains the hang, as the lifo push will loop indefinitely waiting for the success of this cmpset. Note the same erroneous instruction is also present in the _rel variant (at line 94). The trunk has the same issue. This code has not changed at all since IA64.asm was added way back in r4471. I won't have access to the IA64 platform again until tomorrow AM. So, testing my hypothesis will need to wait. BTW: IFF I am right about the source of this problem, then it would be beneficial to have (and I may contribute) a stronger test (for "make check") that would detect this sort of bug in the atomics (specifically look for both false-positive and false-negative return value from 64-bit cmpset operations with values satisfying a range of "corner cases"). I think I have single-bit and double-bit "marching tests" for cmpset in my own arsenal of tests for GASNet's atomics. If I don't have time to contribute a complete test, I can at least contribute that logic for somebody else to port to the OPAL atomics. -Paul P.S.: The cmpxchgN for N in 1,2,4 are documented as ZERO-extending their loads to 64-bits. So, there is a slim chance that the sxt4 actually was intended for the 32-bit cmpset code. However, since the comparison used there is a "cmp4.eq" the "sxt4" would still not be needed. -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group Computer and Data Sciences Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900