On 12/6/19 3:59 PM, Will Furnass wrote:
> Hi all,
>
> We're using EasyBuild 4.0.0 on a Centos 7 + Slurm 19.05 system. We're
> able to build and use 'foss-2019a' EasyBuild toolchain (which includes
> OpenMPI 3.1.3) using the libpmi2 provided by the Slurm folks (by
> uncommenting "configopts = '--with-slurm --with-pmi" in
> OpenMPI-3.1.3-GCC-8.2.0-2.31.1.eb). However, when trying to build R
> 3.6.0 and Rmpi with that toolchain we get:
1 - Did you build the pmi and/or pmi2 parts of Slurm? At least pmi is
now in contrib and not built by default.
2 - --with-pmi as config opt to openmpi means pmi1 not pmi2 if i don't
remember incorrectly.
3 - I'd strongly suggest using PMIx instead.
4 - You can take a look at
https://github.com/easybuilders/easybuild-framework/pull/2777 to see how
we use hooks to add support for PMIx on the fly when building OpenMPI
through easybuild.
> ...
> ** building package indices
> ** testing if installed package can be loaded from temporary location
> --------------------------------------------------------------------------
> PMI2_Init failed to intialize. Return code: 14
>
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> The application appears to have been direct launched using "srun",
> but OMPI was not built with SLURM's PMI support and therefore cannot
> execute. There are several options for building PMI support under
> SLURM, depending upon the SLURM version you are using:
>
> version 16.05 or later: you can use SLURM's PMIx support. This
> requires that you configure and build SLURM --with-pmix.
>
> Versions earlier than 16.05: you must use either SLURM's PMI-1 or
> PMI-2 support. SLURM builds PMI-1 by default, or you can manually
> install PMI-2. You must then build Open MPI using --with-pmi pointing
> to the SLURM PMI library location.
>
> Please configure as appropriate and try again.
> --------------------------------------------------------------------------
> *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> *** and potentially your MPI job)
> [bessemer-node019.shef.ac.uk:199102] Local abort before MPI_INIT
> completed completed successfully, but am not able to aggregate error
> messages, and not able to guarantee that all other processes were
> killed!
> ERROR: loading failed
> ...
>
> We also see a similar error when trying to build GROMACS 19.3.
>
> We have 'MpiDefault=pmi2' in our Slurm config; interestingly if we
> override this with '--mpi=openmpi' (e.g. use "srun -A cstest
> --mpi=openmpi --cpus-per-task=4 --pty /bin/bash -i" to give me an
> interactive session then run eb from there) then EasyBuild's able to
> build Rmpi. However, I'm curious to know how/why we can't use pmi2
> here. Anyone got any thoughts?
>
> Apologies if you think this query should have been posted to the Slurm
> users list instead; I thought I'd ask on here first in case anyone's
> encountered anything similar when using OpenMPI and R
> easyconfigs/easyblocks.
>
> Regards,
>
> Will
>
--
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: [email protected] Phone: +46 90 7866134 Fax: +46 90-580 14
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se