Holger, For SLURM, you need to tell the batch system to actually use the PMI interface (either by using 'srun --mpi=pmi' or setting 'MpiDefault=pmi' in slurm.conf). Maybe something similar is required for Torque/Maui?
Hth, Markus On 10/11/2017 12:52 PM, Holger Angenent wrote: > Dear Kenneth, > > thank you for your reply. > I am working on a Debian machine without Infiniband. The batch system is > torque/maui, there is no slurm. > I installed libpmi0 and libpmi0-dev (,created some symlinks to make > OpenMPI find it) and changed the easyconfig file of OpenMPI to use pmi. > Then I recompiled it. > > But when I use this OpenMPI installation, it still throws the same error > (see below). > > In addition, it claims to have pmi: > --- > ompi_info | grep pmi > MCA pmix: s1 (MCA v2.1.0, API v2.0.0, Component v2.1.1) > MCA pmix: pmix112 (MCA v2.1.0, API v2.0.0, Component > v2.1.1) > MCA ess: pmi (MCA v2.1.0, API v3.0.0, Component v2.1.1) > --- > > So I think the problem has nothing to do with FFTW, but only with OpenMPI. > > Regards, > Holger > > > > > mpirun -n 6 ./montecarlo > -------------------------------------------------------------------------- > A requested component was not found, or was unable to be opened. This > means that this component is either not installed or is unable to be > used on your system (e.g., sometimes this means that shared libraries > that the component requires are unable to be found/loaded). Note that > Open MPI stopped checking at the first component that it did not find. > > Host: fb11-nx-main > Framework: ess > Component: pmi > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > orte_ess_base_open failed > --> Returned value Error (-1) instead of ORTE_SUCCESS > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > It looks like MPI_INIT failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during MPI_INIT; some of which are due to configuration or environment > problems. This failure appears to be an internal failure; here's some > additional information (which may only be relevant to an Open MPI > developer): > > ompi_mpi_init: orte_init failed > --> Returned "Error" (-1) instead of "Success" (0) > -------------------------------------------------------------------------- > [fb11-nx-main:27614] *** An error occurred in MPI_Init > [fb11-nx-main:27614] *** on a NULL communicator > [fb11-nx-main:27614] *** Unknown error > [fb11-nx-main:27614] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort > -------------------------------------------------------------------------- > An MPI process is aborting at a time when it cannot guarantee that all > of its peer processes in the job will be killed properly. You should > double check that everything has shut down cleanly. > > Reason: Before MPI_INIT completed > Local host: fb11-nx-main > PID: 27614 > -------------------------------------------------------------------------- > [fb11-nx-main:27614] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file > runtime/orte_init.c at line 116 > > > > On 11.10.2017 10:54, Kenneth Hoste wrote: >> Dear Holger, >> >> In what kind of environment are you trying to build FFTW here? >> >> You may have to rebuild OpenMPI while specifying the correct location >> for PMI, cfr. >> https://github.com/easybuilders/easybuild-easyconfigs/blob/master/easybuild/easyconfigs/o/OpenMPI/OpenMPI-2.1.1-GCC-6.4.0-2.28.eb#L22 >> . >> >> >> >> regards, >> >> Kenneth >> >> On 10/10/2017 17:40, Holger Angenent wrote: >>> Dear all, >>> >>> I am trying to compile foss-2017a on a Debian system. OpenMPI builds >>> to the end, but when FFTW is using it for its checks, it throws an >>> error (see below). The same happenes with foss-2017b. On a CentOS 7 >>> based machine, both compile without error. >>> >>> Do you have an idea, if an OS dependency might be missing? Did >>> anybody succeed building foss on Debian? >>> >>> Best regards, >>> >>> Holger >>> >>> >>> >>> This is the step failing: >>> >>> mpirun -np 1 >>> /home/ms/f/fhms166656/.local/easybuild/build/FFTW/3.3.6/gompi-2017a/fftw-3.3.6-pl2/mpi/mpi-bench >>> --verbose=1 --verify 'ofrd5x4x12v6' --verify 'ifrd5x4x12v6' >>> --verify 'obcd5x4x12v6' --verify 'ibcd5x4x12v6' --verify >>> 'ofcd5x4x12v6' --verify 'ifcd5x4x12v6' --verify 'ok7hx13e10x13b' >>> --verify 'ik7hx13e10x13b' --verify 'obr[7x7v1' --verify 'ibr[7x7v1' >>> --verify 'obc[7x7v1' --verify 'ibc[7x7v1' --verify 'ofc[7x7v1' >>> --verify 'ifc[7x7v1' --verify 'ok]8hx6o01x4hx11e10' --verify >>> 'ik]8hx6o01x4hx11e10' --verify 'obr9x6' --verify 'ibr9x6' --verify >>> 'ofr9x6' --verify 'ifr9x6' --verify 'obc9x6' --verify 'ibc9x6' >>> --verify 'ofc9x6' --verify 'ifc9x6' --verify 'ok6e10x12e10' --verify >>> 'ik6e10x12e10' --verify 'ofr]9x4x9x10' --verify 'ifr]9x4x9x10' >>> --verify 'obc]9x4x9x10' --verify 'ibc]9x4x9x10' --verify >>> 'ofc]9x4x9x10' --verify 'ifc]9x4x9x10' --verify 'ofrd]10x8x8x12' >>> --verify 'ifrd]10x8x8x12' --verify 'obcd]10x8x8x12' --verify >>> 'ibcd]10x8x8x12' --verify 'ofcd]10x8x8x12' --verify 'ifcd]10x8x8x12' >>> --verify 'ofrd]2x8x8v7' --verify 'ifrd]2x8x8v7' --verify >>> 'obcd]2x8x8v7' --verify 'ibcd]2x8x8v7' >>> -------------------------------------------------------------------------- >>> >>> A requested component was not found, or was unable to be opened. This >>> means that this component is either not installed or is unable to be >>> used on your system (e.g., sometimes this means that shared libraries >>> that the component requires are unable to be found/loaded). Note that >>> Open MPI stopped checking at the first component that it did not find. >>> >>> Host: fb11-nx-main >>> Framework: ess >>> Component: pmi >>> -------------------------------------------------------------------------- >>> >>> [fb11-nx-main:23484] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in >>> file runtime/orte_init.c at line 116 >>> -------------------------------------------------------------------------- >>> >>> It looks like orte_init failed for some reason; your parallel process is >>> likely to abort. There are many reasons that a parallel process can >>> fail during orte_init; some of which are due to configuration or >>> environment problems. This failure appears to be an internal failure; >>> here's some additional information (which may only be relevant to an >>> Open MPI developer): >>> >>> orte_ess_base_open failed >>> --> Returned value Error (-1) instead of ORTE_SUCCESS >>> -------------------------------------------------------------------------- >>> >>> -------------------------------------------------------------------------- >>> >>> It looks like MPI_INIT failed for some reason; your parallel process is >>> likely to abort. There are many reasons that a parallel process can >>> fail during MPI_INIT; some of which are due to configuration or >>> environment >>> problems. This failure appears to be an internal failure; here's some >>> additional information (which may only be relevant to an Open MPI >>> developer): >>> >> > -- Dr. Markus Geimer Juelich Supercomputing Centre Institute for Advanced Simulation Forschungszentrum Juelich GmbH 52425 Juelich, Germany Phone: +49-2461-61-1773 Fax: +49-2461-61-6656 E-Mail: [email protected] WWW: http://www.fz-juelich.de/jsc ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------

