Hi Ole,
On 26/10/16 16:03, Ole Holm Nielsen wrote:
We use the foss-2016b toolchain, and we need OpenMPI to be built with
Slurm resource manager support. It seems that the foss-2016b build
doesn't include Slurm:
# ml OpenMPI/1.10.3-GCC-5.4.0-2.26
# ompi_info | egrep -i 'slurm|pmi'
MCA ess: slurm (MCA v2.0.0, API v3.0.0, Component
v1.10.3)
MCA plm: slurm (MCA v2.0.0, API v2.0.0, Component
v1.10.3)
MCA ras: slurm (MCA v2.0.0, API v2.0.0, Component
v1.10.3)
Our multi-node MPI jobs fail miserably, and I surmise that this is due
to the lacking Slurm support.
Slurm seems to require a build of OpenMPI with 1) --with-pmi and/or 2)
--with-slurm. References:
1) https://www.mail-archive.com/easybuild@lists.ugent.be/msg01975.html
2) https://www.open-mpi.org/faq/?category=slurm
I tried making a copy of the EB file OpenMPI-1.10.3-GCC-5.4.0-2.26.eb
and appending a line:
configopts += '--with-slurm --with-pmi '
and rebuilding the module with eb --force. Unfortunately, the
resulting module seems *not* to include my updated configopts (looking
at the file
$EASYBUILD_PREFIX/ebfiles_repo/OpenMPI/OpenMPI-1.10.3-GCC-5.4.0-2.26.eb).
Question: How do I rebuild the OpenMPI module with proper Slurm support?
Rebuilding with --force should work, so for some reason your customized
EasyBuild was not picked up...
How did you provide it to EasyBuild exactly? Was it available in the
local directory where you ran the 'eb' command?
You can verify that the right easyconfig is picked up via a dry run
like: "eb OpenMPI-1.10.3-GCC-5.4.0-2.26.eb -Df", which will print the
path to the easyconfig files used.
Question: Can Slurm support please be included in future versions of
the OpenMPI module in the foss-201x tool chain?
This is a left as a site-specific customization, since including
--with-slurm hard would make the installation fail on any systems that
do not have SLURM.
We should have documentation on how to deal with site-specific
customisations well though.
Is anyone doing that (JSC, CSCS, TAMU?) up for writing up some
documentation for this?
The existing documentation has some examples hinting towards a possible
setup: http://easybuild.readthedocs.io/en/latest/Configuration.html#example
regards,
Kenneth