Yeah, I remember this one. Its a bug in that specific version of the compiler. I had reported it to the compiler team a couple of years back.
Quoting from the email I sent them: The "stw r0,0(r31)" probably overwrites the previous stack pointer ? static inline int opal_atomic_cmpset_32(volatile int32_t *addr, 10000580: 94 21 ff c0 stwu r1,-64(r1) 10000584: 93 e1 00 3c stw r31,60(r1) 10000588: 7c 3f 0b 78 mr r31,r1 1000058c: 90 7f 00 24 stw r3,36(r31) 10000590: 90 9f 00 28 stw r4,40(r31) 10000594: 90 bf 00 2c stw r5,44(r31) int32_t oldval, int32_t newval) { int32_t ret; __asm__ __volatile__ ( 10000598: 80 9f 00 28 lwz r4,40(r31) 1000059c: 80 7f 00 2c lwz r3,44(r31) 100005a0: 80 1f 00 24 lwz r0,36(r31) *100005a4: 90 1f 00 00 stw r0,0(r31)* 100005a8: 90 1f 00 04 stw r0,4(r31) 100005ac: 90 9f 00 08 stw r4,8(r31) 100005b0: 90 7f 00 0c stw r3,12(r31) 100005b4: 90 1f 00 10 stw r0,16(r31) 100005b8: 80 7f 00 04 lwz r3,4(r31) 100005bc: 7c 80 18 28 lwarx r4,0,r3 100005c0: 80 1f 00 08 lwz r0,8(r31) 100005c4: 7c 04 00 00 cmpw r4,r0 100005c8: 90 9f 00 14 stw r4,20(r31) 100005cc: 90 7f 00 04 stw r3,4(r31) 100005d0: 90 1f 00 08 stw r0,8(r31) 100005d4: 40 82 00 1c bne- 100005f0 <opal_atomic_cmpset_32+0x70> 100005d8: 80 1f 00 0c lwz r0,12(r31) 100005dc: 80 7f 00 04 lwz r3,4(r31) 100005e0: 7c 00 19 2d stwcx. r0,0,r3 Regards --Nysal On Fri, Apr 24, 2015 at 5:06 AM, Paul Hargrove <phhargr...@lbl.gov> wrote: > Exhibit 1: the smoking gun > > Program terminated with signal 11, Segmentation fault. > #0 0x00000fffa4d6f184 in opal_atomic_cmpset_acq_32 (addr=Cannot access > memory at address 0xd8 > ) > at > /home/hargrov1/OMPI/openmpi-1.8.5rc3-linux-ppc64-xlc-11.1/openmpi-1.8.5rc3/opal/include/opal/sys/powerpc/atomic.h:158 > > > So, this is a new symptom of the known inability of this compiler to get > the inline asm right. > > Sorry for the false alarm, > -Paul > > On Thu, Apr 23, 2015 at 4:09 PM, Paul Hargrove <phhargr...@lbl.gov> wrote: > >> I have a system w/ xlc-11.1. >> It has essentially always failed "make check" in a LP64 build due to xlc >> botching the atomics. >> So, when it failed with 1.8.5.rc2 I didn't look closely. >> >> Today it has failed with rc3 and I *did* look closely and here is what I >> see: >> >> PASS: predefined_gap_test >> /bin/sh: line 5: 39766 Segmentation fault ${dir}$tst >> FAIL: dlopen_test >> ======================================================== >> 1 of 2 tests failed >> Please report to http://www.open-mpi.org/community/help/ >> ======================================================== >> >> I also see the same in the rc2 results I hadn't examined closely before. >> Meanwhile the rc1 failure was the known atomics-related one. >> >> So, UNLESS I find that the dlopen_test failure is related to the atomics >> or some other problem specific to xlc, this may be a new issue related to >> the elimination of the built-in libltdl. Note that this system. >> >> Here's hoping this is a new symptom, and not a new problem. >> >> -Paul >> >> -- >> Paul H. Hargrove phhargr...@lbl.gov >> Computer Languages & Systems Software (CLaSS) Group >> Computer Science Department Tel: +1-510-495-2352 >> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >> > > > > -- > Paul H. Hargrove phhargr...@lbl.gov > Computer Languages & Systems Software (CLaSS) Group > Computer Science Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/04/17352.php >