We use the foss-2016b toolchain, and we need OpenMPI to be built with Slurm resource manager support. It seems that the foss-2016b build doesn't include Slurm:

# ml OpenMPI/1.10.3-GCC-5.4.0-2.26
# ompi_info | egrep -i 'slurm|pmi'
                 MCA ess: slurm (MCA v2.0.0, API v3.0.0, Component v1.10.3)
                 MCA plm: slurm (MCA v2.0.0, API v2.0.0, Component v1.10.3)
                 MCA ras: slurm (MCA v2.0.0, API v2.0.0, Component v1.10.3)

Our multi-node MPI jobs fail miserably, and I surmise that this is due to the lacking Slurm support.

Slurm seems to require a build of OpenMPI with 1) --with-pmi and/or 2) --with-slurm. References:

1) https://www.mail-archive.com/easybuild@lists.ugent.be/msg01975.html
2) https://www.open-mpi.org/faq/?category=slurm

I tried making a copy of the EB file OpenMPI-1.10.3-GCC-5.4.0-2.26.eb and appending a line:

configopts += '--with-slurm --with-pmi '

and rebuilding the module with eb --force. Unfortunately, the resulting module seems *not* to include my updated configopts (looking at the file $EASYBUILD_PREFIX/ebfiles_repo/OpenMPI/OpenMPI-1.10.3-GCC-5.4.0-2.26.eb).

Question: How do I rebuild the OpenMPI module with proper Slurm support?

Question: Can Slurm support please be included in future versions of the OpenMPI module in the foss-201x tool chain?

Thanks a lot,
Ole

Reply via email to