Hi Ole,

On 26/10/16 16:03, Ole Holm Nielsen wrote:
We use the foss-2016b toolchain, and we need OpenMPI to be built with Slurm resource manager support. It seems that the foss-2016b build doesn't include Slurm:

# ml OpenMPI/1.10.3-GCC-5.4.0-2.26
# ompi_info | egrep -i 'slurm|pmi'
MCA ess: slurm (MCA v2.0.0, API v3.0.0, Component v1.10.3) MCA plm: slurm (MCA v2.0.0, API v2.0.0, Component v1.10.3) MCA ras: slurm (MCA v2.0.0, API v2.0.0, Component v1.10.3)

Our multi-node MPI jobs fail miserably, and I surmise that this is due to the lacking Slurm support.

Slurm seems to require a build of OpenMPI with 1) --with-pmi and/or 2) --with-slurm. References:

1) https://www.mail-archive.com/easybuild@lists.ugent.be/msg01975.html
2) https://www.open-mpi.org/faq/?category=slurm

I tried making a copy of the EB file OpenMPI-1.10.3-GCC-5.4.0-2.26.eb and appending a line:

configopts += '--with-slurm --with-pmi '

and rebuilding the module with eb --force. Unfortunately, the resulting module seems *not* to include my updated configopts (looking at the file $EASYBUILD_PREFIX/ebfiles_repo/OpenMPI/OpenMPI-1.10.3-GCC-5.4.0-2.26.eb).

Question: How do I rebuild the OpenMPI module with proper Slurm support?

Rebuilding with --force should work, so for some reason your customized EasyBuild was not picked up... How did you provide it to EasyBuild exactly? Was it available in the local directory where you ran the 'eb' command?

You can verify that the right easyconfig is picked up via a dry run like: "eb OpenMPI-1.10.3-GCC-5.4.0-2.26.eb -Df", which will print the path to the easyconfig files used.

Question: Can Slurm support please be included in future versions of the OpenMPI module in the foss-201x tool chain?
This is a left as a site-specific customization, since including --with-slurm hard would make the installation fail on any systems that do not have SLURM.

We should have documentation on how to deal with site-specific customisations well though. Is anyone doing that (JSC, CSCS, TAMU?) up for writing up some documentation for this? The existing documentation has some examples hinting towards a possible setup: http://easybuild.readthedocs.io/en/latest/Configuration.html#example


regards,

Kenneth

Reply via email to