Holger,

For SLURM, you need to tell the batch system to actually use the PMI
interface (either by using 'srun --mpi=pmi' or setting 'MpiDefault=pmi'
in slurm.conf).  Maybe something similar is required for Torque/Maui?

Hth,
Markus


On 10/11/2017 12:52 PM, Holger Angenent wrote:
> Dear Kenneth,
>
> thank you for your reply.
> I am working on a Debian machine without Infiniband. The batch system is
> torque/maui, there is no slurm.
> I installed libpmi0 and libpmi0-dev (,created some symlinks to make
> OpenMPI find it) and changed the easyconfig file of OpenMPI to use pmi.
> Then I recompiled it.
>
> But when I use this OpenMPI installation, it still throws the same error
> (see below).
>
> In addition, it claims to have pmi:
> ---
> ompi_info | grep pmi
>                 MCA pmix: s1 (MCA v2.1.0, API v2.0.0, Component v2.1.1)
>                 MCA pmix: pmix112 (MCA v2.1.0, API v2.0.0, Component
> v2.1.1)
>                  MCA ess: pmi (MCA v2.1.0, API v3.0.0, Component v2.1.1)
> ---
>
> So I think the problem has nothing to do with FFTW, but only with OpenMPI.
>
> Regards,
> Holger
>
>
>
>
> mpirun -n 6 ./montecarlo
> --------------------------------------------------------------------------
> A requested component was not found, or was unable to be opened. This
> means that this component is either not installed or is unable to be
> used on your system (e.g., sometimes this means that shared libraries
> that the component requires are unable to be found/loaded).  Note that
> Open MPI stopped checking at the first component that it did not find.
>
> Host:      fb11-nx-main
> Framework: ess
> Component: pmi
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
>   orte_ess_base_open failed
>   --> Returned value Error (-1) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems.  This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
>   ompi_mpi_init: orte_init failed
>   --> Returned "Error" (-1) instead of "Success" (0)
> --------------------------------------------------------------------------
> [fb11-nx-main:27614] *** An error occurred in MPI_Init
> [fb11-nx-main:27614] *** on a NULL communicator
> [fb11-nx-main:27614] *** Unknown error
> [fb11-nx-main:27614] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
> --------------------------------------------------------------------------
> An MPI process is aborting at a time when it cannot guarantee that all
> of its peer processes in the job will be killed properly.  You should
> double check that everything has shut down cleanly.
>
>   Reason:     Before MPI_INIT completed
>   Local host: fb11-nx-main
>   PID:        27614
> --------------------------------------------------------------------------
> [fb11-nx-main:27614] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file
> runtime/orte_init.c at line 116
>
>
>
> On 11.10.2017 10:54, Kenneth Hoste wrote:
>> Dear Holger,
>>
>> In what kind of environment are you trying to build FFTW here?
>>
>> You may have to rebuild OpenMPI while specifying the correct location
>> for PMI, cfr.
>> https://github.com/easybuilders/easybuild-easyconfigs/blob/master/easybuild/easyconfigs/o/OpenMPI/OpenMPI-2.1.1-GCC-6.4.0-2.28.eb#L22
>> .
>>
>>
>>
>> regards,
>>
>> Kenneth
>>
>> On 10/10/2017 17:40, Holger Angenent wrote:
>>> Dear all,
>>>
>>> I am trying to compile foss-2017a on a Debian system. OpenMPI builds
>>> to the end, but when FFTW is using it for its checks, it throws an
>>> error (see below). The same happenes with foss-2017b. On a CentOS 7
>>> based machine, both compile without error.
>>>
>>> Do you have an idea, if an OS dependency might be missing? Did
>>> anybody succeed building foss on Debian?
>>>
>>> Best regards,
>>>
>>> Holger
>>>
>>>
>>>
>>> This is the step failing:
>>>
>>> mpirun -np 1
>>> /home/ms/f/fhms166656/.local/easybuild/build/FFTW/3.3.6/gompi-2017a/fftw-3.3.6-pl2/mpi/mpi-bench
>>> --verbose=1   --verify 'ofrd5x4x12v6' --verify 'ifrd5x4x12v6'
>>> --verify 'obcd5x4x12v6' --verify 'ibcd5x4x12v6' --verify
>>> 'ofcd5x4x12v6' --verify 'ifcd5x4x12v6' --verify 'ok7hx13e10x13b'
>>> --verify 'ik7hx13e10x13b' --verify 'obr[7x7v1' --verify 'ibr[7x7v1'
>>> --verify 'obc[7x7v1' --verify 'ibc[7x7v1' --verify 'ofc[7x7v1'
>>> --verify 'ifc[7x7v1' --verify 'ok]8hx6o01x4hx11e10' --verify
>>> 'ik]8hx6o01x4hx11e10' --verify 'obr9x6' --verify 'ibr9x6' --verify
>>> 'ofr9x6' --verify 'ifr9x6' --verify  'obc9x6' --verify 'ibc9x6'
>>> --verify 'ofc9x6' --verify 'ifc9x6' --verify 'ok6e10x12e10' --verify
>>> 'ik6e10x12e10' --verify 'ofr]9x4x9x10' --verify 'ifr]9x4x9x10'
>>> --verify 'obc]9x4x9x10' --verify 'ibc]9x4x9x10' --verify
>>> 'ofc]9x4x9x10' --verify 'ifc]9x4x9x10' --verify 'ofrd]10x8x8x12'
>>> --verify 'ifrd]10x8x8x12' --verify 'obcd]10x8x8x12' --verify
>>> 'ibcd]10x8x8x12' --verify 'ofcd]10x8x8x12' --verify 'ifcd]10x8x8x12'
>>> --verify 'ofrd]2x8x8v7' --verify 'ifrd]2x8x8v7' --verify
>>> 'obcd]2x8x8v7' --verify 'ibcd]2x8x8v7'
>>> --------------------------------------------------------------------------
>>>
>>> A requested component was not found, or was unable to be opened. This
>>> means that this component is either not installed or is unable to be
>>> used on your system (e.g., sometimes this means that shared libraries
>>> that the component requires are unable to be found/loaded). Note that
>>> Open MPI stopped checking at the first component that it did not find.
>>>
>>> Host:      fb11-nx-main
>>> Framework: ess
>>> Component: pmi
>>> --------------------------------------------------------------------------
>>>
>>> [fb11-nx-main:23484] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in
>>> file runtime/orte_init.c at line 116
>>> --------------------------------------------------------------------------
>>>
>>> It looks like orte_init failed for some reason; your parallel process is
>>> likely to abort.  There are many reasons that a parallel process can
>>> fail during orte_init; some of which are due to configuration or
>>> environment problems.  This failure appears to be an internal failure;
>>> here's some additional information (which may only be relevant to an
>>> Open MPI developer):
>>>
>>>   orte_ess_base_open failed
>>>   --> Returned value Error (-1) instead of ORTE_SUCCESS
>>> --------------------------------------------------------------------------
>>>
>>> --------------------------------------------------------------------------
>>>
>>> It looks like MPI_INIT failed for some reason; your parallel process is
>>> likely to abort.  There are many reasons that a parallel process can
>>> fail during MPI_INIT; some of which are due to configuration or
>>> environment
>>> problems.  This failure appears to be an internal failure; here's some
>>> additional information (which may only be relevant to an Open MPI
>>> developer):
>>>
>>
>

--
Dr. Markus Geimer
Juelich Supercomputing Centre
Institute for Advanced Simulation
Forschungszentrum Juelich GmbH
52425 Juelich, Germany

Phone:  +49-2461-61-1773
Fax:    +49-2461-61-6656
E-Mail: [email protected]
WWW:    http://www.fz-juelich.de/jsc


------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------

Reply via email to