Hi Joseph,

Thanks for this. I'll recompile with --with-debug and see what I get.
The funny thing is that I seem to get the right outputs, but with this
error- I suspect it could happen at MPI_Finalize but I do not see any
obvious problems.

In any case, I am compiling OpenMPI 4.1.0a1 with gcc 6.3.0 on a SGI ICE
XA with PBS.

Regards,
Luis

On 25/03/2020 11:51, Joseph Schuchart via devel wrote:
> Hi Luis,
>
> My first step is usually to configure Open MPI with `--with-debug` and
> recompile/install. Then use DDT (or gdb inside an xterm per rank if
> DDT is not available and you have X-fowarding on the nodes). When the
> segfault happens you at least get proper symbols inside Open MPI that
> may hint at the problem. You can post your findings here of course.
>
> It would also help to have more info on the platform and the version
> of Open MPI you're running on :)
>
> Cheers
> Joseph
>
> On 3/25/20 12:21 PM, Luis Cebamanos via devel wrote:
>> Hi ompi devs,
>>
>> Any idea where should I start debugging this kind of error from? This
>> comes from a plain "Hello World".
>>
>> [r1i0n32:67074] *** Process received signal ***
>> [r1i0n32:67074] Signal: Segmentation fault (11)
>> [r1i0n32:67074] Signal code: Address not mapped (1)
>> [r1i0n32:67074] Failing at address: 0x30
>> [r1i0n32:67074] [ 0] /lib64/libpthread.so.0(+0xf100)[0x2aaaab00c100]
>> [r1i0n32:67074] [ 1]
>> /lustre/home/z04/us1/test_ompi/lib/libmpi.so.0(ompi_mpi_errcode_finalize+0xeaf)[0x2aaaaad10a3f]
>>
>> [r1i0n32:67074] [ 2]
>> /lustre/home/z04/us1/test_ompi/lib/libmpi.so.0(ompi_mpi_finalize+0x750)[0x2aaaaad1ce30]
>>
>> [r1i0n32:67074] [ 3] ./a.out[0x400abb]
>> [r1i0n32:67074] [ 4]
>> /lib64/libc.so.6(__libc_start_main+0xf5)[0x2aaaab23ab15]
>> [r1i0n32:67074] [ 5] ./a.out[0x400939]
>> [r1i0n32:67074] *** End of error message ***
>>
>> --------------------------------------------------------------------------
>>
>> Primary jobĀ  terminated normally, but 1 process returned
>> a non-zero exit code. Per user-direction, the job has been aborted.
>> --------------------------------------------------------------------------
>>
>> --------------------------------------------------------------------------
>>
>> mpirun noticed that process rank 2 with PID 0 on node r1i2n26 exited on
>> signal 1
>> 1 (Segmentation fault).
>>
>> Regards,
>> Luis
>>
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.

Reply via email to