On Oct 22, 2007, at 2:49 PM, Bogdan Costescu wrote:
<short version>
Is there some known incompatibility of the latest stable versions with
the PathScale 3.0 compilers ?
</short version>
There is in the openib BTL. We've had an open issue with PathScale
for many months. They're able to reproduce the error and have
narrowed it down to a single .o file, but have not yet found the
specific problem (that was the last I heard a few months ago).
To be honest, I removed the pathscale suite from my regular
regression testing many months ago because of this long-standing
problem; I don't know if any other pathscale-specific issues have
crept in since then.
<long version>
I have a very puzzling problem with the following combination:
- PathScale 3.0 suite
- Open MPI 1.2.3 and 1.2.4 (both behave the same)
- Debian etch, kernel 2.6.22.9/x86_64 running on AMD Opteron
I just recompiled the OMPI 1.2 branch with pathscale 3.0 on RHEL4U4
and I do not see the problems that you are seeing. :-\ Is Debian
etch a supported pathscale platform?
[13:44] svbu-mpi:/home/jsquyres/openmpi-1.2.4 % ompi_info
Open MPI: 1.2.4
Open MPI SVN revision: r16187
Open RTE: 1.2.4
Open RTE SVN revision: r16187
OPAL: 1.2.4
OPAL SVN revision: r16187
Prefix: /home/jsquyres/bogus
Configured architecture: x86_64-unknown-linux-gnu
Configured by: jsquyres
Configured on: Mon Oct 22 13:34:17 PDT 2007
Configure host: svbu-mpi.cisco.com
Built by: jsquyres
Built on: Mon Oct 22 13:40:55 PDT 2007
Built host: svbu-mpi.cisco.com
C bindings: yes
C++ bindings: yes
Fortran77 bindings: yes (all)
Fortran90 bindings: yes
Fortran90 bindings size: small
C compiler: pathcc
C compiler absolute: /opt/pathscale/3.0/bin/pathcc
C++ compiler: pathCC
C++ compiler absolute: /opt/pathscale/3.0/bin/pathCC
Fortran77 compiler: pathf90
Fortran77 compiler abs: /opt/pathscale/3.0/bin/pathf90
Fortran90 compiler: pathf90
Fortran90 compiler abs: /opt/pathscale/3.0/bin/pathf90
C profiling: yes
....etc.
Upon invoking any installed binary (opmi_info, mpif90 --showinfo), I
get a segmentation fault. The trace looks strange (to me, at
least ;-)):
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004430d9 in _int_free (av=0x5b1ea0, mem=0x5b40b0) at /
home/thor1/costescu/build/openmpi-1.2.4/opal/mca/memory/ptmalloc2/
malloc.c:4416
4416 fwd->bk = p;
(gdb) bt
#0 0x00000000004430d9 in _int_free (av=0x5b1ea0, mem=0x5b40b0) at /
home/thor1/costescu/build/openmpi-1.2.4/opal/mca/memory/ptmalloc2/
malloc.c:4416
#1 0x000000000044141b in free (mem=0x5b40b0) at /home/thor1/
costescu/build/openmpi-1.2.4/opal/mca/memory/ptmalloc2/malloc.c:3513
#2 0x00002b27dc920590 in vasprintf () from /lib/libc.so.6
#3 0x00002b27dc906588 in asprintf () from /lib/libc.so.6
#4 0x0000000000421274 in opal_output_init () at /home/thor1/
costescu/build/openmpi-1.2.4/opal/util/output.c:130
#5 0x0000000000421c83 in do_open (output_id=-1, lds=0x591530) at /
home/thor1/costescu/build/openmpi-1.2.4/opal/util/output.c:422
#6 0x0000000000421529 in opal_output_open (lds=0x591530) at /home/
thor1/costescu/build/openmpi-1.2.4/opal/util/output.c:176
#7 0x00000000004201e4 in opal_malloc_init () at /home/thor1/
costescu/build/openmpi-1.2.4/opal/util/malloc.c:67
#8 0x000000000040e6ac in opal_init_util () at runtime/opal_init.c:137
#9 0x000000000040932e in main (argc=2, argv=0x7fffceb02608) at /
home/thor1/costescu/build/openmpi-1.2.4/opal/tools/wrappers/
opal_wrapper.c:424
This happens only with the PathScale 3.0 compilers; I have no problems
when using the default gcc and friends version 4.1.2 compilers; I also
have no problems in using the PathScale 3.0 compilers either alone or
with Myricom's MPICH/MX.
The problem build was obtained after:
./configure --prefix=/home/thor1/costescu/openmpi-1.2.4-ps30 --
enable-static --disable-shared --with-mx=/opt_local/mx --disable-io-
romio --enable-debug --enable-pretty-print-stacktrace
(configure and make logs available on request)
I thought about asking here first to avoid any 'this is known' or
embarassing errors that I might have made, before filling a bug
report. The existing bugs related to PathScale compilers don't seem
to describe the symptoms that I'm seeing, unless it's some kind of
threading issue which seems to have no resolution yet...
Thanks in advance !
</long version>
--
Bogdan Costescu
IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: bogdan.coste...@iwr.uni-heidelberg.de
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
Jeff Squyres
Cisco Systems