Hi Joseph, Thanks for this. I'll recompile with --with-debug and see what I get. The funny thing is that I seem to get the right outputs, but with this error- I suspect it could happen at MPI_Finalize but I do not see any obvious problems.
In any case, I am compiling OpenMPI 4.1.0a1 with gcc 6.3.0 on a SGI ICE XA with PBS. Regards, Luis On 25/03/2020 11:51, Joseph Schuchart via devel wrote: > Hi Luis, > > My first step is usually to configure Open MPI with `--with-debug` and > recompile/install. Then use DDT (or gdb inside an xterm per rank if > DDT is not available and you have X-fowarding on the nodes). When the > segfault happens you at least get proper symbols inside Open MPI that > may hint at the problem. You can post your findings here of course. > > It would also help to have more info on the platform and the version > of Open MPI you're running on :) > > Cheers > Joseph > > On 3/25/20 12:21 PM, Luis Cebamanos via devel wrote: >> Hi ompi devs, >> >> Any idea where should I start debugging this kind of error from? This >> comes from a plain "Hello World". >> >> [r1i0n32:67074] *** Process received signal *** >> [r1i0n32:67074] Signal: Segmentation fault (11) >> [r1i0n32:67074] Signal code: Address not mapped (1) >> [r1i0n32:67074] Failing at address: 0x30 >> [r1i0n32:67074] [ 0] /lib64/libpthread.so.0(+0xf100)[0x2aaaab00c100] >> [r1i0n32:67074] [ 1] >> /lustre/home/z04/us1/test_ompi/lib/libmpi.so.0(ompi_mpi_errcode_finalize+0xeaf)[0x2aaaaad10a3f] >> >> [r1i0n32:67074] [ 2] >> /lustre/home/z04/us1/test_ompi/lib/libmpi.so.0(ompi_mpi_finalize+0x750)[0x2aaaaad1ce30] >> >> [r1i0n32:67074] [ 3] ./a.out[0x400abb] >> [r1i0n32:67074] [ 4] >> /lib64/libc.so.6(__libc_start_main+0xf5)[0x2aaaab23ab15] >> [r1i0n32:67074] [ 5] ./a.out[0x400939] >> [r1i0n32:67074] *** End of error message *** >> >> -------------------------------------------------------------------------- >> >> Primary jobĀ terminated normally, but 1 process returned >> a non-zero exit code. Per user-direction, the job has been aborted. >> -------------------------------------------------------------------------- >> >> -------------------------------------------------------------------------- >> >> mpirun noticed that process rank 2 with PID 0 on node r1i2n26 exited on >> signal 1 >> 1 (Segmentation fault). >> >> Regards, >> Luis >> >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336.