Hi Luis,
My first step is usually to configure Open MPI with `--with-debug` and
recompile/install. Then use DDT (or gdb inside an xterm per rank if DDT
is not available and you have X-fowarding on the nodes). When the
segfault happens you at least get proper symbols inside Open MPI that
may hint at the problem. You can post your findings here of course.
It would also help to have more info on the platform and the version of
Open MPI you're running on :)
Cheers
Joseph
On 3/25/20 12:21 PM, Luis Cebamanos via devel wrote:
Hi ompi devs,
Any idea where should I start debugging this kind of error from? This
comes from a plain "Hello World".
[r1i0n32:67074] *** Process received signal ***
[r1i0n32:67074] Signal: Segmentation fault (11)
[r1i0n32:67074] Signal code: Address not mapped (1)
[r1i0n32:67074] Failing at address: 0x30
[r1i0n32:67074] [ 0] /lib64/libpthread.so.0(+0xf100)[0x2aaaab00c100]
[r1i0n32:67074] [ 1]
/lustre/home/z04/us1/test_ompi/lib/libmpi.so.0(ompi_mpi_errcode_finalize+0xeaf)[0x2aaaaad10a3f]
[r1i0n32:67074] [ 2]
/lustre/home/z04/us1/test_ompi/lib/libmpi.so.0(ompi_mpi_finalize+0x750)[0x2aaaaad1ce30]
[r1i0n32:67074] [ 3] ./a.out[0x400abb]
[r1i0n32:67074] [ 4]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x2aaaab23ab15]
[r1i0n32:67074] [ 5] ./a.out[0x400939]
[r1i0n32:67074] *** End of error message ***
--------------------------------------------------------------------------
Primary jobĀ terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 2 with PID 0 on node r1i2n26 exited on
signal 1
1 (Segmentation fault).
Regards,
Luis
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.