TERRY DONTJE <terry.don...@oracle.com> wrote:
>>>Can you build OMPI as a 32 bit library and see if that works any better?
>>So you mean I shall leave the whole OFED stack as 64 bit and build only 
>>openmpi as 32 bit?
>I believe the OFED user libraries will need to be 32 bit also or the 32 bit 
>MPI libraries will not be able to use them.
>
>>How must I configure openmpi that it'll be definitely built as 32bit?
>You need to change the CFLAGS, CXXFLAGS, FFLAGS and FCFLAGS in the configure 
>line such that you replace "-m64" with "-m32" or just "-m32" if "-m64" is not 
>there?


Maybe that's interesting for the OFED guys:
To get OFED's 'install.pl' working with '--build32' on sparc64 I had to add the 
following lines (marked with +):
...
elsif ($arch eq "ppc64") {
    $target_cpu32 = 'ppc';
}
+elsif ($arch eq "sparc64") {
+    $target_cpu32 = 'sparc';
+}
...
After that the chosen libs from OFED were built as 32 and 64 bit versions.


Hello Terry,

I could build a 32 bit version of
- openmpi-1.4.4
- osu_benchmarks-3.1.1
and link them against the needed 32bit OFED libraries.

But the problem is still the same. But anyway thanks for the good tip to try 
the 32 Bit version!

That's the error message I get:
# /usr/mpi/gcc/openmpi-1.4.4/bin/mpirun -np 2 -host ib1,ib2 
~/razik/src/OFED-1.5.4-rc4/SRPMS/mpitests-3.2/osu_benchmarks-3.1.1/osu_latency
# OSU MPI Latency Test v3.1.1
# Size            Latency (us)
[cluster1:61532] *** Process received signal ***
[cluster1:61532] Signal: Bus error (10)
[cluster1:61532] Signal code: Invalid address alignment (1)
[cluster1:61532] Failing at address: 0x898a53
[cluster1:61532] [ 0] 
/usr/mpi/gcc/openmpi-1.4.4/lib/openmpi/mca_pml_ob1.so(+0x50e0) [0xf72090e0]
[cluster1:61532] [ 1] 
/usr/mpi/gcc/openmpi-1.4.4/lib/openmpi/mca_coll_tuned.so(+0x1750) [0xf6fe9750]
[cluster1:61532] [ 2] 
/usr/mpi/gcc/openmpi-1.4.4/lib/openmpi/mca_coll_tuned.so(+0x8e5c) [0xf6ff0e5c]
[cluster1:61532] [ 3] 
/usr/mpi/gcc/openmpi-1.4.4/lib/libmpi.so.0(PMPI_Barrier+0xc0) [0xf77b718c]
[cluster1:61532] [ 4] 
/root/razik/src/OFED-1.5.4-rc4/SRPMS/mpitests-3.2/osu_benchmarks-3.1.1/osu_latency(main+0x2c8)
 [0x10cb0]
[cluster1:61532] [ 5] /lib/libc.so.6(__libc_start_main+0x10c) [0xf73e464c]
[cluster1:61532] [ 6] 
/root/razik/src/OFED-1.5.4-rc4/SRPMS/mpitests-3.2/osu_benchmarks-3.1.1/osu_latency(_start+0x2c)
 [0x1090c]
[cluster1:61532] *** End of error message ***
[cluster2:07039] *** Process received signal ***
[cluster2:07039] Signal: Bus error (10)
[cluster2:07039] Signal code: Invalid address alignment (1)
[cluster2:07039] Failing at address: 0x898a53
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 61532 on node cluster1 exited on 
signal 10 (Bus error).
--------------------------------------------------------------------------
[cluster2:07039] [ 0] 
/usr/mpi/gcc/openmpi-1.4.4/lib/openmpi/mca_pml_ob1.so(+0x50e0) [0xf77750e0]
[cluster2:07039] [ 1] 
/usr/mpi/gcc/openmpi-1.4.4/lib/openmpi/mca_coll_tuned.so(+0x1750) [0xf7555750]
[cluster2:07039] [ 2] 
/usr/mpi/gcc/openmpi-1.4.4/lib/openmpi/mca_coll_tuned.so(+0x8e5c) [0xf755ce5c]
[cluster2:07039] [ 3] 
/usr/mpi/gcc/openmpi-1.4.4/lib/libmpi.so.0(PMPI_Barrier+0xc0) [0xf7d3318c]
[cluster2:07039] [ 4] 
/root/razik/src/OFED-1.5.4-rc4/SRPMS/mpitests-3.2/osu_benchmarks-3.1.1/osu_latency(main+0x2c8)
 [0x10cb0]
[cluster2:07039] [ 5] /lib/libc.so.6(__libc_start_main+0x10c) [0xf796464c]
[cluster2:07039] [ 6] 
/root/razik/src/OFED-1.5.4-rc4/SRPMS/mpitests-3.2/osu_benchmarks-3.1.1/osu_latency(_start+0x2c)
 [0x1090c]
[cluster2:07039] *** End of error message ***

# ldd /usr/mpi/gcc/openmpi-1.4.4/bin/mpirun
        libopen-rte.so.0 => /usr/mpi/gcc/openmpi-1.4.4/lib/libopen-rte.so.0 
(0xf7c18000)
        libopen-pal.so.0 => /usr/mpi/gcc/openmpi-1.4.4/lib/libopen-pal.so.0 
(0xf7bbc000)
        libdl.so.2 => /lib/libdl.so.2 (0xf7b90000)
        libnsl.so.1 => /lib/libnsl.so.1 (0xf7b68000)
        libutil.so.1 => /lib/libutil.so.1 (0xf7b54000)
        libm.so.6 => /lib/libm.so.6 (0xf7a70000)
        libpthread.so.0 => /lib/libpthread.so.0 (0xf7a44000)
        libc.so.6 => /lib/libc.so.6 (0xf78c4000)
        /lib/ld-linux.so.2 (0x70000000)
---

Best regards,
Lukas

Reply via email to