Ralph, When rebuilding with --enable-debug and the original gcc-4.0.0 the SEGV returns. So, the ompi-1.4 in the LD_LIBRARY_PATH was NOT the cause.
Below is a backtrace from gdb which includes line numbers. The SEGV is in strlen() which suggests a string which lacks null-termination. The initial (siginfo) part of the backtrace provided by Open MPI reads: [pcp-j-6:02741] *** Process received signal *** [pcp-j-6:02741] Signal: Segmentation fault (11) [pcp-j-6:02741] Signal code: Address not mapped (1) [pcp-j-6:02741] Failing at address: 0x63757274 -Paul #0 0x00a5dbb3 in strlen () from /lib/libc.so.6 #1 0x00a5d8f5 in strdup () from /lib/libc.so.6 #2 0x00534a3b in mca_base_var_enum_create (name=0x349488 "coll_ml_enable_fragmentation_enum", values=0x34e014, enumerator=0xbfe03dd0) at /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/openmpi-1.7.4rc2r30168/opal/mca/base/mca_base_var_enum.c:133 #3 0x0033c328 in mca_coll_ml_register_params () at /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/openmpi-1.7.4rc2r30168/ompi/mca/coll/ml/coll_ml_mca.c:257 #4 0x00537585 in register_components (project_name=0x2056f3 "ompi", type_name=0x2056f8 "coll", output_id=-1, src=0xbfe03e7c, dest=0x21bd10) at /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/openmpi-1.7.4rc2r30168/opal/mca/base/mca_base_components_register.c:116 #5 0x0053736a in mca_base_framework_components_register (framework=0x21bce0, flags=MCA_BASE_REGISTER_DEFAULT) at /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/openmpi-1.7.4rc2r30168/opal/mca/base/mca_base_components_register.c:67 #6 0x00537ec1 in mca_base_framework_register (framework=0x21bce0, flags=MCA_BASE_REGISTER_DEFAULT) at /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/openmpi-1.7.4rc2r30168/opal/mca/base/mca_base_framework.c:107 #7 0x00537f6f in mca_base_framework_open (framework=0x21bce0, flags=MCA_BASE_OPEN_DEFAULT) at /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/openmpi-1.7.4rc2r30168/opal/mca/base/mca_base_framework.c:131 #8 0x00152831 in ompi_mpi_init (argc=1, argv=0xbfe04114, requested=0, provided=0xbfe0400c) at /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/openmpi-1.7.4rc2r30168/ompi/runtime/ompi_mpi_init.c:555 #9 0x00186ce1 in PMPI_Init (argc=0xbfe04090, argv=0xbfe04094) at pinit.c:84 #10 0x080486e9 in main (argc=1, argv=0xbfe04114) at ring_c.c:19 On Wed, Jan 8, 2014 at 8:45 PM, Paul Hargrove <phhargr...@lbl.gov> wrote: > Only takes <30 seconds of typing to start the test and I get email when it > is done. > Typing these emails takes more of my time than the actual testing does. > > -Paul > > > On Wed, Jan 8, 2014 at 8:35 PM, Ralph Castain <r...@open-mpi.org> wrote: > >> If you have the time, it might be worth nailing it down. However, I'm >> mindful of all the things you need to do, so please only if you have the >> time. >> >> Thanks >> Ralph >> >> On Jan 8, 2014, at 8:23 PM, Paul Hargrove <phhargr...@lbl.gov> wrote: >> >> Ralph, >> >> Building with gcc-4.1.2 fixed the problem for me. I also removed an old >> install of ompi-1.4 that was in LD_LIBRARY_PATH at build time and might >> have been a contributing factor. If I'd known earlier that it was there, I >> wouldn't have reported the problem without first removing it. >> >> I can build again with gcc-4.0.0 and --enable-debug if you are still >> interested in trying to get a line number. This would also determine if >> LD_LIBRARY_PATH was the true culprit. >> >> -Paul [Sent from my phone] >> On Jan 8, 2014 8:02 PM, "Ralph Castain" <r...@open-mpi.org> wrote: >> >>> Most likely problem is a bad backing store site - any chance you could >>> give me a line number from this? There are a lot of calls to register >>> params in that code and I'd need some help in figuring out which one wasn't >>> right. >>> >>> >>> On Jan 8, 2014, at 6:59 PM, Paul Hargrove <phhargr...@lbl.gov> wrote: >>> >>> I am still testing the current 1.7.4rc tarball on my various systems. >>> The latest failure (shown below) is a SEGV somewhere below MPI_Init on a >>> old, but otherwise fairly normal, Linux/x86 (32-bit) system. >>> >>> $ /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/bin/mpirun >>> -np 1 examples/ring_c >>> [pcp-j-6:29031] *** Process received signal *** >>> [pcp-j-6:29031] Signal: Segmentation fault (11) >>> [pcp-j-6:29031] Signal code: Address not mapped (1) >>> [pcp-j-6:29031] Failing at address: 0x6c6c6f63 >>> [pcp-j-6:29031] [ 0] [0xbe4440] >>> [pcp-j-6:29031] [ 1] >>> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_var_enum_create+0x15d) >>> [0x2b11ed] >>> [pcp-j-6:29031] [ 2] >>> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_register_params+0x639) >>> [0x440909] >>> [pcp-j-6:29031] [ 3] >>> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_framework_components_register+0x14e) >>> [0x2b2cce] >>> [pcp-j-6:29031] [ 4] >>> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_framework_register+0x1b5) >>> [0x2b32a5] >>> [pcp-j-6:29031] [ 5] >>> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_framework_open+0x4e) >>> [0x2b333e] >>> [pcp-j-6:29031] [ 6] >>> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libmpi.so.1(ompi_mpi_init+0x53d) >>> [0xaf359d] >>> [pcp-j-6:29031] [ 7] >>> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libmpi.so.1(MPI_Init+0x13d) >>> [0xb10d6d] >>> [pcp-j-6:29031] [ 8] examples/ring_c [0x80486e9] >>> [pcp-j-6:29031] [ 9] /lib/libc.so.6(__libc_start_main+0xdc) [0x125ebc] >>> [pcp-j-6:29031] [10] examples/ring_c [0x8048631] >>> [pcp-j-6:29031] *** End of error message *** >>> >>> -------------------------------------------------------------------------- >>> mpirun noticed that process rank 0 with PID 29031 on node pcp-j-6 exited >>> on signal 11 (Segmentation fault). >>> >>> -------------------------------------------------------------------------- >>> >>> The failure shown is for a singleton run, but np=2 fails as well. >>> >>> System info: >>> $ uname -a >>> Linux pcp-j-6 2.6.18-238.1.1.el5PAE #1 SMP Tue Jan 18 19:28:42 EST 2011 >>> i686 athlon i386 GNU/Linux >>> $ gcc --version >>> gcc (GCC) 4.0.0 >>> Copyright (C) 2005 Free Software Foundation, Inc. >>> This is free software; see the source for copying conditions. There is >>> NO >>> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR >>> PURPOSE. >>> >>> The only configure argument used was --prefix. >>> >>> I was going to attach output from "ompi_info --all", but it SEGV's too! >>> >>> $ ompi_info --all >>> [pcp-j-6:29092] *** Process received signal *** >>> [pcp-j-6:29092] Signal: Segmentation fault (11) >>> [pcp-j-6:29092] Signal code: Address not mapped (1) >>> [pcp-j-6:29092] Failing at address: 0x6c6c6f63 >>> [pcp-j-6:29092] [ 0] [0xd8a440] >>> [pcp-j-6:29092] [ 1] >>> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_var_enum_create+0x15d) >>> [0x2db1ed] >>> [pcp-j-6:29092] [ 2] >>> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_register_params+0x639) >>> [0x48d909] >>> [pcp-j-6:29092] [ 3] >>> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_framework_components_register+0x14e) >>> [0x2dccce] >>> [pcp-j-6:29092] [ 4] >>> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_framework_register+0x1b5) >>> [0x2dd2a5] >>> [pcp-j-6:29092] [ 5] >>> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(opal_info_register_project_frameworks+0x57) >>> [0x2b83d7] >>> [pcp-j-6:29092] [ 6] >>> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libmpi.so.1(ompi_info_register_framework_params+0x81) >>> [0xa69251] >>> [pcp-j-6:29092] [ 7] ompi_info(main+0x2ba) [0x8049a2a] >>> [pcp-j-6:29092] [ 8] /lib/libc.so.6(__libc_start_main+0xdc) [0x125ebc] >>> [pcp-j-6:29092] [ 9] ompi_info [0x80496e1] >>> [pcp-j-6:29092] *** End of error message *** >>> Segmentation fault (core dumped) >>> >>> I will try again with a newer gcc and report back. >>> >>> -Paul >>> >>> -- >>> Paul H. Hargrove phhargr...@lbl.gov >>> Future Technologies Group >>> Computer and Data Sciences Department Tel: +1-510-495-2352 >>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > > > > -- > Paul H. Hargrove phhargr...@lbl.gov > Future Technologies Group > Computer and Data Sciences Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group Computer and Data Sciences Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900