Thank you for your help, Holger
On 11.10.2017 13:17, Markus Geimer wrote:
Holger, For SLURM, you need to tell the batch system to actually use the PMI interface (either by using 'srun --mpi=pmi' or setting 'MpiDefault=pmi' in slurm.conf). Maybe something similar is required for Torque/Maui? Hth, Markus On 10/11/2017 12:52 PM, Holger Angenent wrote:Dear Kenneth, thank you for your reply. I am working on a Debian machine without Infiniband. The batch system is torque/maui, there is no slurm. I installed libpmi0 and libpmi0-dev (,created some symlinks to make OpenMPI find it) and changed the easyconfig file of OpenMPI to use pmi. Then I recompiled it. But when I use this OpenMPI installation, it still throws the same error (see below). In addition, it claims to have pmi: --- ompi_info | grep pmi MCA pmix: s1 (MCA v2.1.0, API v2.0.0, Component v2.1.1) MCA pmix: pmix112 (MCA v2.1.0, API v2.0.0, Component v2.1.1) MCA ess: pmi (MCA v2.1.0, API v3.0.0, Component v2.1.1) --- So I think the problem has nothing to do with FFTW, but only with OpenMPI. Regards, Holger mpirun -n 6 ./montecarlo -------------------------------------------------------------------------- A requested component was not found, or was unable to be opened. This means that this component is either not installed or is unable to be used on your system (e.g., sometimes this means that shared libraries that the component requires are unable to be found/loaded). Note that Open MPI stopped checking at the first component that it did not find. Host: fb11-nx-main Framework: ess Component: pmi -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_ess_base_open failed --> Returned value Error (-1) instead of ORTE_SUCCESS -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): ompi_mpi_init: orte_init failed --> Returned "Error" (-1) instead of "Success" (0) -------------------------------------------------------------------------- [fb11-nx-main:27614] *** An error occurred in MPI_Init [fb11-nx-main:27614] *** on a NULL communicator [fb11-nx-main:27614] *** Unknown error [fb11-nx-main:27614] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort -------------------------------------------------------------------------- An MPI process is aborting at a time when it cannot guarantee that all of its peer processes in the job will be killed properly. You should double check that everything has shut down cleanly. Reason: Before MPI_INIT completed Local host: fb11-nx-main PID: 27614 -------------------------------------------------------------------------- [fb11-nx-main:27614] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 116 On 11.10.2017 10:54, Kenneth Hoste wrote:Dear Holger, In what kind of environment are you trying to build FFTW here? You may have to rebuild OpenMPI while specifying the correct location for PMI, cfr. https://github.com/easybuilders/easybuild-easyconfigs/blob/master/easybuild/easyconfigs/o/OpenMPI/OpenMPI-2.1.1-GCC-6.4.0-2.28.eb#L22 . regards, Kenneth On 10/10/2017 17:40, Holger Angenent wrote:Dear all, I am trying to compile foss-2017a on a Debian system. OpenMPI builds to the end, but when FFTW is using it for its checks, it throws an error (see below). The same happenes with foss-2017b. On a CentOS 7 based machine, both compile without error. Do you have an idea, if an OS dependency might be missing? Did anybody succeed building foss on Debian? Best regards, Holger This is the step failing: mpirun -np 1 /home/ms/f/fhms166656/.local/easybuild/build/FFTW/3.3.6/gompi-2017a/fftw-3.3.6-pl2/mpi/mpi-bench --verbose=1 --verify 'ofrd5x4x12v6' --verify 'ifrd5x4x12v6' --verify 'obcd5x4x12v6' --verify 'ibcd5x4x12v6' --verify 'ofcd5x4x12v6' --verify 'ifcd5x4x12v6' --verify 'ok7hx13e10x13b' --verify 'ik7hx13e10x13b' --verify 'obr[7x7v1' --verify 'ibr[7x7v1' --verify 'obc[7x7v1' --verify 'ibc[7x7v1' --verify 'ofc[7x7v1' --verify 'ifc[7x7v1' --verify 'ok]8hx6o01x4hx11e10' --verify 'ik]8hx6o01x4hx11e10' --verify 'obr9x6' --verify 'ibr9x6' --verify 'ofr9x6' --verify 'ifr9x6' --verify 'obc9x6' --verify 'ibc9x6' --verify 'ofc9x6' --verify 'ifc9x6' --verify 'ok6e10x12e10' --verify 'ik6e10x12e10' --verify 'ofr]9x4x9x10' --verify 'ifr]9x4x9x10' --verify 'obc]9x4x9x10' --verify 'ibc]9x4x9x10' --verify 'ofc]9x4x9x10' --verify 'ifc]9x4x9x10' --verify 'ofrd]10x8x8x12' --verify 'ifrd]10x8x8x12' --verify 'obcd]10x8x8x12' --verify 'ibcd]10x8x8x12' --verify 'ofcd]10x8x8x12' --verify 'ifcd]10x8x8x12' --verify 'ofrd]2x8x8v7' --verify 'ifrd]2x8x8v7' --verify 'obcd]2x8x8v7' --verify 'ibcd]2x8x8v7' -------------------------------------------------------------------------- A requested component was not found, or was unable to be opened. This means that this component is either not installed or is unable to be used on your system (e.g., sometimes this means that shared libraries that the component requires are unable to be found/loaded). Note that Open MPI stopped checking at the first component that it did not find. Host: fb11-nx-main Framework: ess Component: pmi -------------------------------------------------------------------------- [fb11-nx-main:23484] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 116 -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_ess_base_open failed --> Returned value Error (-1) instead of ORTE_SUCCESS -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer):-- Dr. Markus Geimer Juelich Supercomputing Centre Institute for Advanced Simulation Forschungszentrum Juelich GmbH 52425 Juelich, Germany Phone: +49-2461-61-1773 Fax: +49-2461-61-6656 E-Mail: [email protected] WWW: http://www.fz-juelich.de/jsc ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------
-- Westfälische Wilhelms-Universität Münster (WWU) Zentrum für Informationsverarbeitung (ZIV) Röntgenstraße 7-13 48149 Münster +49-(0)251-83 31569 [email protected] www.uni-muenster.de/ZIV
smime.p7s
Description: S/MIME Cryptographic Signature

