Dear Kenneth,
thank you for your reply.
I am working on a Debian machine without Infiniband. The batch
system is
torque/maui, there is no slurm.
I installed libpmi0 and libpmi0-dev (,created some symlinks to make
OpenMPI find it) and changed the easyconfig file of OpenMPI to use pmi.
Then I recompiled it.
But when I use this OpenMPI installation, it still throws the same
error
(see below).
In addition, it claims to have pmi:
---
ompi_info | grep pmi
MCA pmix: s1 (MCA v2.1.0, API v2.0.0, Component
v2.1.1)
MCA pmix: pmix112 (MCA v2.1.0, API v2.0.0, Component
v2.1.1)
MCA ess: pmi (MCA v2.1.0, API v3.0.0, Component
v2.1.1)
---
So I think the problem has nothing to do with FFTW, but only with
OpenMPI.
Regards,
Holger
mpirun -n 6 ./montecarlo
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened. This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
Open MPI stopped checking at the first component that it did not find.
Host: fb11-nx-main
Framework: ess
Component: pmi
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel
process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
orte_ess_base_open failed
--> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or
environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: orte_init failed
--> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------
[fb11-nx-main:27614] *** An error occurred in MPI_Init
[fb11-nx-main:27614] *** on a NULL communicator
[fb11-nx-main:27614] *** Unknown error
[fb11-nx-main:27614] *** MPI_ERRORS_ARE_FATAL: your MPI job will now
abort
--------------------------------------------------------------------------
An MPI process is aborting at a time when it cannot guarantee that all
of its peer processes in the job will be killed properly. You should
double check that everything has shut down cleanly.
Reason: Before MPI_INIT completed
Local host: fb11-nx-main
PID: 27614
--------------------------------------------------------------------------
[fb11-nx-main:27614] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file
runtime/orte_init.c at line 116
On 11.10.2017 10:54, Kenneth Hoste wrote:
Dear Holger,
In what kind of environment are you trying to build FFTW here?
You may have to rebuild OpenMPI while specifying the correct location
for PMI, cfr.
https://github.com/easybuilders/easybuild-easyconfigs/blob/master/easybuild/easyconfigs/o/OpenMPI/OpenMPI-2.1.1-GCC-6.4.0-2.28.eb#L22
.
regards,
Kenneth
On 10/10/2017 17:40, Holger Angenent wrote:
Dear all,
I am trying to compile foss-2017a on a Debian system. OpenMPI builds
to the end, but when FFTW is using it for its checks, it throws an
error (see below). The same happenes with foss-2017b. On a CentOS 7
based machine, both compile without error.
Do you have an idea, if an OS dependency might be missing? Did
anybody succeed building foss on Debian?
Best regards,
Holger
This is the step failing:
mpirun -np 1
/home/ms/f/fhms166656/.local/easybuild/build/FFTW/3.3.6/gompi-2017a/fftw-3.3.6-pl2/mpi/mpi-bench
--verbose=1 --verify 'ofrd5x4x12v6' --verify 'ifrd5x4x12v6'
--verify 'obcd5x4x12v6' --verify 'ibcd5x4x12v6' --verify
'ofcd5x4x12v6' --verify 'ifcd5x4x12v6' --verify 'ok7hx13e10x13b'
--verify 'ik7hx13e10x13b' --verify 'obr[7x7v1' --verify 'ibr[7x7v1'
--verify 'obc[7x7v1' --verify 'ibc[7x7v1' --verify 'ofc[7x7v1'
--verify 'ifc[7x7v1' --verify 'ok]8hx6o01x4hx11e10' --verify
'ik]8hx6o01x4hx11e10' --verify 'obr9x6' --verify 'ibr9x6' --verify
'ofr9x6' --verify 'ifr9x6' --verify 'obc9x6' --verify 'ibc9x6'
--verify 'ofc9x6' --verify 'ifc9x6' --verify 'ok6e10x12e10' --verify
'ik6e10x12e10' --verify 'ofr]9x4x9x10' --verify 'ifr]9x4x9x10'
--verify 'obc]9x4x9x10' --verify 'ibc]9x4x9x10' --verify
'ofc]9x4x9x10' --verify 'ifc]9x4x9x10' --verify 'ofrd]10x8x8x12'
--verify 'ifrd]10x8x8x12' --verify 'obcd]10x8x8x12' --verify
'ibcd]10x8x8x12' --verify 'ofcd]10x8x8x12' --verify 'ifcd]10x8x8x12'
--verify 'ofrd]2x8x8v7' --verify 'ifrd]2x8x8v7' --verify
'obcd]2x8x8v7' --verify 'ibcd]2x8x8v7'
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened. This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
Open MPI stopped checking at the first component that it did not
find.
Host: fb11-nx-main
Framework: ess
Component: pmi
--------------------------------------------------------------------------
[fb11-nx-main:23484] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in
file runtime/orte_init.c at line 116
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel
process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal
failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
orte_ess_base_open failed
--> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel
process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or
environment
problems. This failure appears to be an internal failure; here's
some
additional information (which may only be relevant to an Open MPI
developer):