Shoot. My bad there. Looks like the enumerator sentinel is missing. Will fix 
now.

-Nathan

On Wed, Jan 08, 2014 at 09:27:46PM -0800, Paul Hargrove wrote:
>    Ralph,
>    When rebuilding with --enable-debug and the original gcc-4.0.0 the SEGV
>    returns.
>    So, the ompi-1.4 in the LD_LIBRARY_PATH was NOT the cause.
>    Below is a backtrace from gdb which includes line numbers.
>    The SEGV is in strlen() which suggests a string which lacks
>    null-termination.
>    The initial (siginfo) part of the backtrace provided by Open MPI reads:
>    [pcp-j-6:02741] *** Process received signal ***
>    [pcp-j-6:02741] Signal: Segmentation fault (11)
>    [pcp-j-6:02741] Signal code: Address not mapped (1)
>    [pcp-j-6:02741] Failing at address: 0x63757274
>    -Paul
>    #0  0x00a5dbb3 in strlen () from /lib/libc.so.6
>    #1  0x00a5d8f5 in strdup () from /lib/libc.so.6
>    #2  0x00534a3b in mca_base_var_enum_create (name=0x349488
>    "coll_ml_enable_fragmentation_enum", 
>        values=0x34e014, enumerator=0xbfe03dd0)
>        at
>    
> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/openmpi-1.7.4rc2r30168/opal/mca/base/mca_base_var_enum.c:133
>    #3  0x0033c328 in mca_coll_ml_register_params ()
>        at
>    
> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/openmpi-1.7.4rc2r30168/ompi/mca/coll/ml/coll_ml_mca.c:257
>    #4  0x00537585 in register_components (project_name=0x2056f3 "ompi",
>    type_name=0x2056f8 "coll", 
>        output_id=-1, src=0xbfe03e7c, dest=0x21bd10)
>        at
>    
> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/openmpi-1.7.4rc2r30168/opal/mca/base/mca_base_components_register.c:116
>    #5  0x0053736a in mca_base_framework_components_register
>    (framework=0x21bce0, 
>        flags=MCA_BASE_REGISTER_DEFAULT)
>        at
>    
> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/openmpi-1.7.4rc2r30168/opal/mca/base/mca_base_components_register.c:67
>    #6  0x00537ec1 in mca_base_framework_register (framework=0x21bce0,
>    flags=MCA_BASE_REGISTER_DEFAULT)
>        at
>    
> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/openmpi-1.7.4rc2r30168/opal/mca/base/mca_base_framework.c:107
>    #7  0x00537f6f in mca_base_framework_open (framework=0x21bce0,
>    flags=MCA_BASE_OPEN_DEFAULT)
>        at
>    
> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/openmpi-1.7.4rc2r30168/opal/mca/base/mca_base_framework.c:131
>    #8  0x00152831 in ompi_mpi_init (argc=1, argv=0xbfe04114, requested=0,
>    provided=0xbfe0400c)
>        at
>    
> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/openmpi-1.7.4rc2r30168/ompi/runtime/ompi_mpi_init.c:555
>    #9  0x00186ce1 in PMPI_Init (argc=0xbfe04090, argv=0xbfe04094) at
>    pinit.c:84
>    #10 0x080486e9 in main (argc=1, argv=0xbfe04114) at ring_c.c:19
> 
>    On Wed, Jan 8, 2014 at 8:45 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
> 
>      Only takes <30 seconds of typing to start the test and I get email when
>      it is done.
>      Typing these emails takes more of my time than the actual testing does.
>      -Paul
> 
>      On Wed, Jan 8, 2014 at 8:35 PM, Ralph Castain <r...@open-mpi.org> wrote:
> 
>        If you have the time, it might be worth nailing it down. However, I'm
>        mindful of all the things you need to do, so please only if you have
>        the time.
>        Thanks
>        Ralph
>        On Jan 8, 2014, at 8:23 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
> 
>          Ralph,
> 
>          Building with gcc-4.1.2 fixed the problem for me.  I also removed an
>          old install of ompi-1.4 that was in LD_LIBRARY_PATH at build time
>          and might have been a contributing factor.  If I'd known earlier
>          that it was there, I wouldn't have reported the problem without
>          first removing it.
> 
>          I can build again with gcc-4.0.0 and --enable-debug if you are still
>          interested in trying to get a line number.  This would also
>          determine if LD_LIBRARY_PATH was the true culprit.
> 
>          -Paul [Sent from my phone]
> 
>          On Jan 8, 2014 8:02 PM, "Ralph Castain" <r...@open-mpi.org> wrote:
> 
>            Most likely problem is a bad backing store site - any chance you
>            could give me a line number from this? There are a lot of calls to
>            register params in that code and I'd need some help in figuring
>            out which one wasn't right.
>            On Jan 8, 2014, at 6:59 PM, Paul Hargrove <phhargr...@lbl.gov>
>            wrote:
> 
>              I am still testing the current 1.7.4rc tarball on my various
>              systems.  The latest failure (shown below) is a SEGV somewhere
>              below MPI_Init on a old, but otherwise fairly normal, Linux/x86
>              (32-bit) system. 
>              $
>              
> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/bin/mpirun
>              -np 1 examples/ring_c
>              [pcp-j-6:29031] *** Process received signal ***
>              [pcp-j-6:29031] Signal: Segmentation fault (11)
>              [pcp-j-6:29031] Signal code: Address not mapped (1)
>              [pcp-j-6:29031] Failing at address: 0x6c6c6f63
>              [pcp-j-6:29031] [ 0] [0xbe4440]
>              [pcp-j-6:29031] [ 1]
>              
> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_var_enum_create+0x15d)
>              [0x2b11ed]
>              [pcp-j-6:29031] [ 2]
>              
> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_register_params+0x639)
>              [0x440909]
>              [pcp-j-6:29031] [ 3]
>              
> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_framework_components_register+0x14e)
>              [0x2b2cce]
>              [pcp-j-6:29031] [ 4]
>              
> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_framework_register+0x1b5)
>              [0x2b32a5]
>              [pcp-j-6:29031] [ 5]
>              
> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_framework_open+0x4e)
>              [0x2b333e]
>              [pcp-j-6:29031] [ 6]
>              
> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libmpi.so.1(ompi_mpi_init+0x53d)
>              [0xaf359d]
>              [pcp-j-6:29031] [ 7]
>              
> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libmpi.so.1(MPI_Init+0x13d)
>              [0xb10d6d]
>              [pcp-j-6:29031] [ 8] examples/ring_c [0x80486e9]
>              [pcp-j-6:29031] [ 9] /lib/libc.so.6(__libc_start_main+0xdc)
>              [0x125ebc]
>              [pcp-j-6:29031] [10] examples/ring_c [0x8048631]
>              [pcp-j-6:29031] *** End of error message ***
>              
> --------------------------------------------------------------------------
>              mpirun noticed that process rank 0 with PID 29031 on node
>              pcp-j-6 exited on signal 11 (Segmentation fault).
>              
> --------------------------------------------------------------------------
>              The failure shown is for a singleton run, but np=2 fails as
>              well.
>              System info:
>              $ uname -a
>              Linux pcp-j-6 2.6.18-238.1.1.el5PAE #1 SMP Tue Jan 18 19:28:42
>              EST 2011 i686 athlon i386 GNU/Linux
>              $ gcc --version
>              gcc (GCC) 4.0.0
>              Copyright (C) 2005 Free Software Foundation, Inc.
>              This is free software; see the source for copying conditions.
>               There is NO
>              warranty; not even for MERCHANTABILITY or FITNESS FOR A
>              PARTICULAR PURPOSE.
>              The only configure argument used was --prefix.
>              I was going to attach output from "ompi_info --all", but it
>              SEGV's too!
>              $ ompi_info --all 
>              [pcp-j-6:29092] *** Process received signal ***
>              [pcp-j-6:29092] Signal: Segmentation fault (11)
>              [pcp-j-6:29092] Signal code: Address not mapped (1)
>              [pcp-j-6:29092] Failing at address: 0x6c6c6f63
>              [pcp-j-6:29092] [ 0] [0xd8a440]
>              [pcp-j-6:29092] [ 1]
>              
> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_var_enum_create+0x15d)
>              [0x2db1ed]
>              [pcp-j-6:29092] [ 2]
>              
> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_register_params+0x639)
>              [0x48d909]
>              [pcp-j-6:29092] [ 3]
>              
> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_framework_components_register+0x14e)
>              [0x2dccce]
>              [pcp-j-6:29092] [ 4]
>              
> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_framework_register+0x1b5)
>              [0x2dd2a5]
>              [pcp-j-6:29092] [ 5]
>              
> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(opal_info_register_project_frameworks+0x57)
>              [0x2b83d7]
>              [pcp-j-6:29092] [ 6]
>              
> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libmpi.so.1(ompi_info_register_framework_params+0x81)
>              [0xa69251]
>              [pcp-j-6:29092] [ 7] ompi_info(main+0x2ba) [0x8049a2a]
>              [pcp-j-6:29092] [ 8] /lib/libc.so.6(__libc_start_main+0xdc)
>              [0x125ebc]
>              [pcp-j-6:29092] [ 9] ompi_info [0x80496e1]
>              [pcp-j-6:29092] *** End of error message ***
>              Segmentation fault (core dumped)
>              I will try again with a newer gcc and report back.
>              -Paul
>              --
>              Paul H. Hargrove                          phhargr...@lbl.gov
>              Future Technologies Group
>              Computer and Data Sciences Department     Tel: +1-510-495-2352
>              Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>              _______________________________________________
>              devel mailing list
>              de...@open-mpi.org
>              http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
>            _______________________________________________
>            devel mailing list
>            de...@open-mpi.org
>            http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
>          _______________________________________________
>          devel mailing list
>          de...@open-mpi.org
>          http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
>        _______________________________________________
>        devel mailing list
>        de...@open-mpi.org
>        http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
>      --
>      Paul H. Hargrove                          phhargr...@lbl.gov
>      Future Technologies Group
>      Computer and Data Sciences Department     Tel: +1-510-495-2352
>      Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
> 
>    --
>    Paul H. Hargrove                          phhargr...@lbl.gov
>    Future Technologies Group
>    Computer and Data Sciences Department     Tel: +1-510-495-2352
>    Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Attachment: pgpIBK_B883N2.pgp
Description: PGP signature

Reply via email to