Yes, seriously. This code is still undergoing testing which is part of
the reason it is on master. Once I am confident in the code I will be
updating some on my code to use a fifo instead of an opal_list_t and a
lock.

I don't know if the barrier will make a difference but it is the only
place I could see for a possibly inconsistency. It might not make any
difference. If that is the case I will dig deeper.

-Nathan

On Thu, Feb 12, 2015 at 03:48:25PM -0500, George Bosilca wrote:
>    Seriously?
>      George.
>    On Thu, Feb 12, 2015 at 1:00 PM, Nathan Hjelm <hje...@lanl.gov> wrote:
> 
>      I think I see the issue. Looks like there is a missing memory barrier
>      after the head consistency code. I will add one and see if that fixes
>      your problem.
> 
>      BTW, I can't reproduce the issue on any of my systems :-/.
> 
>      -Nathan
>      On Thu, Feb 12, 2015 at 02:07:08AM -0800, Paul Hargrove wrote:
>      >    Just experienced the same failure as below with
>      openmpi-dev-904-g08dceda
>      >    build with "gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16)" on
>      Scientific
>      >    Linux 7.x (a RHEL 7 clone).
>      >    gdb says:
>      >    Program received signal SIGSEGV, Segmentation fault.
>      >    [Switching to Thread 0x7ffff53b0700 (LWP 19685)]
>      >    0x0000000000401417 in opal_fifo_pop_atomic (fifo=0x7fffffffe130)
>      >        at
>      >   
>      
> /home/phargrov/OMPI/openmpi-master-linux-x86_64-sl7x/openmpi-dev-904-g08dceda/opal/class/opal_fifo.h:127
>      >    127             next = (opal_list_item_t *) item->opal_list_next;
>      >    -Paul
>      >    On Fri, Feb 6, 2015 at 4:22 PM, Paul Hargrove <phhargr...@lbl.gov>
>      wrote:
>      >
>      >      Yes, this time I really mean "fifo", not "lifo".  ;-)
>      >      With last night's master tarball (Open MPI dev-845-ga3275aa)
>      configured
>      >      with only --prefix and --enable-debug
>      >      A Linux-86-64 system running debian Wheezy and compiler = "gcc
>      (Debian
>      >      4.7.2-5) 4.7.2"
>      >      Failure from "make check":
>      >     
>      
> /home/phargrov/OMPI/openmpi-master-linux-x86_64-wheezy/openmpi-dev-845-ga3275aa/config/test-driver:
>      >      line 95:  3697 Segmentation fault      "$@" > $log_file 2>&1
>      >      FAIL: opal_fifo
>      >      Manual run shows:
>      >      $ ./test/class/opal_fifo
>      >      Single thread test. Time: 0 s 33534 us 33 nsec/poppush
>      >      Atomics thread finished. Time: 0 s 82289 us 82 nsec/poppush
>      >      Atomics thread finished. Time: 4 s 844299 us 4844 nsec/poppush
>      >      Atomics thread finished. Time: 5 s 27642 us 5027 nsec/poppush
>      >      Atomics thread finished. Time: 5 s 65829 us 5065 nsec/poppush
>      >      Atomics thread finished. Time: 5 s 264239 us 5264 nsec/poppush
>      >      Atomics thread finished. Time: 5 s 432407 us 5432 nsec/poppush
>      >      Atomics thread finished. Time: 5 s 462913 us 5462 nsec/poppush
>      >      Atomics thread finished. Time: 5 s 466208 us 5466 nsec/poppush
>      >      Atomics thread finished. Time: 5 s 485575 us 5485 nsec/poppush
>      >      All threads finished. Thread count: 8 Time: 5 s 485844 us 685
>      >      nsec/poppush
>      >      Segmentation fault (core dumped)
>      >      When run within GDB:
>      >      Program received signal SIGSEGV, Segmentation fault.
>      >      [Switching to Thread 0x7ffff5c64700 (LWP 3948)]
>      >      0x0000000000401568 in opal_fifo_pop_atomic (fifo=0x7fffffffe830)
>      >          at
>      >     
>      
> /home/phargrov/OMPI/openmpi-master-linux-x86_64-wheezy/openmpi-dev-845-ga3275aa/opal/class/opal_fifo.h:127
>      >      127             next = (opal_list_item_t *) item->opal_list_next;
>      >      (gdb) print item
>      >      $1 = (opal_list_item_t *) 0x0
>      >      (gdb) where
>      >      #0  0x0000000000401568 in opal_fifo_pop_atomic
>      (fifo=0x7fffffffe830)
>      >          at
>      >     
>      
> /home/phargrov/OMPI/openmpi-master-linux-x86_64-wheezy/openmpi-dev-845-ga3275aa/opal/class/opal_fifo.h:127
>      >      #1  0x000000000040193d in thread_test_exhaust
>      (arg=0x7fffffffe830)
>      >          at
>      >     
>      
> /home/phargrov/OMPI/openmpi-master-linux-x86_64-wheezy/openmpi-dev-845-ga3275aa/test/class/opal_fifo.c:79
>      >      #2  0x00007ffff6ff9b50 in start_thread () from
>      >      /lib/x86_64-linux-gnu/libpthread.so.0
>      >      #3  0x00007ffff6d4370d in clone () from
>      /lib/x86_64-linux-gnu/libc.so.6
>      >      #4  0x0000000000000000 in ?? ()
>      >      -Paul
>      >      --
>      >      Paul H. Hargrove                          phhargr...@lbl.gov
>      >      Computer Languages & Systems Software (CLaSS) Group
>      >      Computer Science Department               Tel: +1-510-495-2352
>      >      Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>      >
>      >    --
>      >    Paul H. Hargrove                          phhargr...@lbl.gov
>      >    Computer Languages & Systems Software (CLaSS) Group
>      >    Computer Science Department               Tel: +1-510-495-2352
>      >    Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
> 
>      > _______________________________________________
>      > devel mailing list
>      > de...@open-mpi.org
>      > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>      > Link to this post:
>      http://www.open-mpi.org/community/lists/devel/2015/02/16975.php
> 
>      _______________________________________________
>      devel mailing list
>      de...@open-mpi.org
>      Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>      Link to this post:
>      http://www.open-mpi.org/community/lists/devel/2015/02/16978.php

> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/02/16979.php

Attachment: pgp2Oy6hd8QxX.pgp
Description: PGP signature

Reply via email to