Hi Luis,

My first step is usually to configure Open MPI with `--with-debug` and recompile/install. Then use DDT (or gdb inside an xterm per rank if DDT is not available and you have X-fowarding on the nodes). When the segfault happens you at least get proper symbols inside Open MPI that may hint at the problem. You can post your findings here of course.

It would also help to have more info on the platform and the version of Open MPI you're running on :)

Cheers
Joseph

On 3/25/20 12:21 PM, Luis Cebamanos via devel wrote:
Hi ompi devs,

Any idea where should I start debugging this kind of error from? This
comes from a plain "Hello World".

[r1i0n32:67074] *** Process received signal ***
[r1i0n32:67074] Signal: Segmentation fault (11)
[r1i0n32:67074] Signal code: Address not mapped (1)
[r1i0n32:67074] Failing at address: 0x30
[r1i0n32:67074] [ 0] /lib64/libpthread.so.0(+0xf100)[0x2aaaab00c100]
[r1i0n32:67074] [ 1]
/lustre/home/z04/us1/test_ompi/lib/libmpi.so.0(ompi_mpi_errcode_finalize+0xeaf)[0x2aaaaad10a3f]
[r1i0n32:67074] [ 2]
/lustre/home/z04/us1/test_ompi/lib/libmpi.so.0(ompi_mpi_finalize+0x750)[0x2aaaaad1ce30]
[r1i0n32:67074] [ 3] ./a.out[0x400abb]
[r1i0n32:67074] [ 4]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x2aaaab23ab15]
[r1i0n32:67074] [ 5] ./a.out[0x400939]
[r1i0n32:67074] *** End of error message ***

--------------------------------------------------------------------------
Primary jobĀ  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 2 with PID 0 on node r1i2n26 exited on
signal 1
1 (Segmentation fault).

Regards,
Luis

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply via email to