Yep, I use ldd every days. But here the problem comes from a corrupted
structure in MorphMPI and MPI
typedef struct{
int MorphMPI_SOURCE;
int MorphMPI_TAG;
int MorphMPI_ERROR;
void* mpi_status ;
} MorphMPI_Status ;
Where the attribut mpi_status is used to point a real MPI_Status. In MPICH:
typedef struct{
int MPI_SOURCE;
int MPI_TAG;
int MPI_ERROR;
int count ;
} MPI_Status ;
Then, when my MorphMPI_Status is given to MorphMPI_Get_count(), the
attribut MorphMPI_Status::mpi_status is not corrupted but
MorphMPI_Status::mpi_status::count is corrupted: the value should be 4
and not "random".
I tried to manipulate the structure MorphMPI_Status (add another integer
to align it in 64-bits, only have the void*,...) without success.
As reminder, this problem appears only when the MPI is used through a
dynamic linked MorphMPI library.
Does someone have an idea?
Mathieu Gontier
Core Development Engineer
Read the attached v-card for telephone, fax, adress
Look at our web-site http://www.fft.be
Joe Landman wrote:
Greetings Mathieu:
Mathieu Gontier wrote:
[...]
So, I meet a little problem whatever the MPI library used (I tried
with MPICH-1.2.5.2, MPICHGM and IntelMPI).
When MorphMPI is linked statically with my parallel application,
everything is ok; but when MorphMPI is linked dynamically with my
parallel application, MPI_Get_count return a wrong value.
I concluded it is difficult to use a MPI library thought a shared
library. I wonder if someone have more information about it (in this
Not likely. I would suggest ldd. It is your friend.
For example:
[EMAIL PROTECTED]:~/workspace/source-mpi$ ldd matmul_mpi_3.exe
libm.so.6 => /lib/libm.so.6 (0x00002b5409d17000)
libmpi.so.0 => not found
libopen-rte.so.0 => not found
libopen-pal.so.0 => not found
librt.so.1 => /lib/librt.so.1 (0x00002b5409f99000)
libdl.so.2 => /lib/libdl.so.2 (0x00002b540a1a2000)
libnsl.so.1 => /lib/libnsl.so.1 (0x00002b540a3a6000)
libutil.so.1 => /lib/libutil.so.1 (0x00002b540a5c0000)
libpthread.so.0 => /lib/libpthread.so.0 (0x00002b540a7c3000)
libc.so.6 => /lib/libc.so.6 (0x00002b540a9de000)
/lib64/ld-linux-x86-64.so.2 (0x00002b5409af9000)
Notice that libmpi.so.0 is not found, so I can't run this by hand.
Unless I force the issue using LD_LIBRARY_PATH
[EMAIL PROTECTED]:~/workspace/source-mpi$ export
LD_LIBRARY_PATH="/home/joe/local/lib64/:/home/joe/local/lib/"
[EMAIL PROTECTED]:~/workspace/source-mpi$ ldd matmul_mpi_3.exe
libm.so.6 => /lib/libm.so.6 (0x00002ae35ca50000)
libmpi.so.0 => /home/joe/local/lib/libmpi.so.0
(0x00002ae35ccd1000)
libopen-rte.so.0 => /home/joe/local/lib/libopen-rte.so.0
(0x00002ae35cfe8000)
libopen-pal.so.0 => /home/joe/local/lib/libopen-pal.so.0
(0x00002ae35d2b3000)
librt.so.1 => /lib/librt.so.1 (0x00002ae35d514000)
libdl.so.2 => /lib/libdl.so.2 (0x00002ae35d71d000)
libnsl.so.1 => /lib/libnsl.so.1 (0x00002ae35d921000)
libutil.so.1 => /lib/libutil.so.1 (0x00002ae35db3b000)
libpthread.so.0 => /lib/libpthread.so.0 (0x00002ae35dd3e000)
libc.so.6 => /lib/libc.so.6 (0x00002ae35df59000)
/lib64/ld-linux-x86-64.so.2 (0x00002ae35c832000)
and it might even run ...
[EMAIL PROTECTED]:~/workspace/source-mpi$ ./matmul_mpi_3.exe
D[tid=0]: running on machine = pegasus-i
D: checking arguments: N_args=1
D: arg[0] = ./matmul_mpi_3.exe
Allocating memory ...
array size in MB = 7.629 MB
(remember, you have 2 of these)normalization a: 0.05510, b: 0.00173
0 : loop_min = 0, loop_max = 1000
...
Do you have some sort of LD_LIBRARY_PATH set up? Or something set in
/etc/ld.so.config that points to where these things are? Remember,
mpirun/mpiexec's alternative purpose in life is to set up the correct
run time environment for you, so you might want to see what is going
on with the environment in your equivalent command.
_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf