Some progress: I fixed IA64.asm but still saw failures. I realized I'd not checked the ia64/atomic.h file.
Lo and behold the origin of the bogus "sxt4" is a pair of improper casts, removed by the following: --- opal/include/opal/sys/ia64/atomic.h~ 2014-01-23 13:04:03.000000000 -0800 +++ opal/include/opal/sys/ia64/atomic.h 2014-01-23 13:04:42.000000000 -0800 @@ -119,7 +119,7 @@ __asm__ __volatile__ ("cmpxchg8.acq %0=[%1],%2,ar.ccv": "=r"(ret) : "r"(addr), "r"(newval) : "memory"); - return ((int32_t)ret == oldval); + return (ret == oldval); } @@ -132,7 +132,7 @@ __asm__ __volatile__ ("cmpxchg8.rel %0=[%1],%2,ar.ccv": "=r"(ret) : "r"(addr), "r"(newval) : "memory"); - return ((int32_t)ret == oldval); + return (ret == oldval); } #endif /* OMPI_GCC_INLINE_ASSEMBLY */ I will retest ASAP and report with, I hope, an attachment to fix both IA64.asm and ia64/atomic.h -Paul On Wed, Jan 22, 2014 at 4:24 PM, Paul Hargrove <phhargr...@lbl.gov> wrote: > > On Wed, Jan 22, 2014 at 2:22 PM, Paul Hargrove <phhargr...@lbl.gov> wrote: > >> My ia64 asm is a bit rusty, but I'll give a quick look if/when I can. > > > I had a look (in v1.7) and this is what I see: > > $cat -n IA64.asm | grep -A14 opal_atomic_cmpset_acq_64: > 70 opal_atomic_cmpset_acq_64: > 71 .prologue > 72 .body > 73 mov ar.ccv=r33;; > 74 cmpxchg8.acq r32=[r32],r34,ar.ccv > 75 ;; > 76 sxt4 r32 = r32 > 77 ;; > 78 cmp.eq p6, p7 = r33, r32 > 79 ;; > 80 (p6) addl r8 = 1, r0 > 81 (p7) mov r8 = r0 > 82 br.ret.sptk.many b0 > 83 ;; > 84 .endp opal_atomic_cmpset_acq_64# > > The (approximate and non-atomic) C equivalent is: > > // r32 = address > // r33 = oldvalue > // r34 = newvalue > int opal_atomic_cmpset_acq_64(int64_t r32, int64_t r33, int64 r34) { > int64_t ccv = r33; // L73 > if (*(int64_t *)r32 == ccv) *(int64_t *)r32 = r34; // L74 > > r32 = (int64_t)(int32_t)r32; // L76 = sign-extend 32->64 > > bool p6, p7; > p7 = !(p6 = (r33 == r32)); // L78 > > const int r0 = 0; > int r8; > if (p6) r8 = 1 + r0; // L80 > if (p7) r8 = r0; // L81 > return r8; // L82 > } > > Which is fine except that line 76 is totally wrong!! > The "sxt4" instruction is "sign-extend from 4 bytes to 8 bytes". > Thus the upper 32-bits of the value read from memory are lost! > Unless the upper 33 bits off r33 (oldvalue) are all 0s or all 1s, the > comparison on line 78 MUST fail. > This explains the hang, as the lifo push will loop indefinitely waiting > for the success of this cmpset. > > Note the same erroneous instruction is also present in the _rel variant > (at line 94). > The trunk has the same issue. > This code has not changed at all since IA64.asm was added way back in > r4471. > > I won't have access to the IA64 platform again until tomorrow AM. > So, testing my hypothesis will need to wait. > > BTW: > IFF I am right about the source of this problem, then it would be > beneficial to have (and I may contribute) a stronger test (for "make > check") that would detect this sort of bug in the atomics (specifically > look for both false-positive and false-negative return value from 64-bit > cmpset operations with values satisfying a range of "corner cases"). I > think I have single-bit and double-bit "marching tests" for cmpset in my > own arsenal of tests for GASNet's atomics. If I don't have time to > contribute a complete test, I can at least contribute that logic for > somebody else to port to the OPAL atomics. > > -Paul > > P.S.: > The cmpxchgN for N in 1,2,4 are documented as ZERO-extending their loads > to 64-bits. > So, there is a slim chance that the sxt4 actually was intended for the > 32-bit cmpset code. > However, since the comparison used there is a "cmp4.eq" the "sxt4" would > still not be needed. > > > -- > Paul H. Hargrove phhargr...@lbl.gov > Future Technologies Group > Computer and Data Sciences Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group Computer and Data Sciences Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900