Dear Ole,

On 26/09/2023 08:24, Ole Holm Nielsen wrote:
I'm starting EasyBuild up on our new AMD "Genoa" platform with 1 AMD EPYC 9124 16-Core Processor with 2 threads/core, 384 GB RAM, Ethernet network only, and AlmaLinux 8.8 OS.

Unfortunately, building the foss-2022b toolchain exits during the testing phase of OpenMPI-4.1.4-GCC-12.2.0.eb as shown below.  Does anyone have ideas about what might be wrong?

$ eb foss-2022b.eb -r
(lines deleted)
== processing EasyBuild easyconfig /home/modules/software/EasyBuild/4.8.1/easybuild/easyconfigs/o/OpenMPI/OpenMPI-4.1.4-GCC-12.2.0.eb
== building and installing OpenMPI/4.1.4-GCC-12.2.0...
== fetching files...
== ... (took 1 secs)
== creating build dir, resetting environment...
== unpacking...
== ... (took 1 secs)
== patching...
== preparing...
== configuring...
== ... (took 2 mins 22 secs)
== building...
== ... (took 4 mins 24 secs)
== testing...
== ... (took 36 secs)
== installing...
== ... (took 1 min 15 secs)
== taking care of extensions...
== restore after iterating...
== postprocessing...
== sanity checking...
== ... (took 5 secs)
== FAILED: Installation ended unsuccessfully (build directory: /dev/shm/OpenMPI/4.1.4/GCC-12.2.0): build failed (first 300 chars): Sanity check failed: sanity check command OMPI_MCA_rmaps_base_oversubscribe=1 mpirun -n 8 /dev/shm/OpenMPI/4.1.4/GCC-12.2.0/mpi_test_hello_c exited with code 1 (output: -------------------------------------------------------------------------- A requested component was not found, or was unable to be (took 8 mins 48 secs) == Results of the build can be found in the log file(s) /tmp/eb-watuyqhw/easybuild-OpenMPI-4.1.4-20230926.080727.GEZtD.log ERROR: Build of /home/modules/software/EasyBuild/4.8.1/easybuild/easyconfigs/o/OpenMPI/OpenMPI-4.1.4-GCC-12.2.0.eb failed (err: 'build failed (first 300 chars): Sanity check failed: sanity check command OMPI_MCA_rmaps_base_oversubscribe=1 mpirun -n 8 /dev/shm/OpenMPI/4.1.4/GCC-12.2.0/mpi_test_hello_c exited with code 1 (output: --------------------------------------------------------------------------\nA requested component was not found, or was unable to be')


The log file shows messages about missing components:

(lines deleted)
--------------------------------------------------------------------------
[e000.nifl.fysik.dtu.dk:1849636] PML cm cannot be selected
[e000.nifl.fysik.dtu.dk:1849635] PML cm cannot be selected
[e000.nifl.fysik.dtu.dk:1849626] 2 more processes have sent help message help-mca-base.txt / find-available:not-valid [e000.nifl.fysik.dtu.dk:1849626] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages [e000.nifl.fysik.dtu.dk:1849626] 1 more process has sent help message help-mca-base.txt / find-available:none found
)
sanity check command mpirun -n 1 /dev/shm/OpenMPI/4.1.4/GCC-12.2.0/mpi_test_hello_usempi exited with code 1 (output: --------------------------------------------------------------------------
A requested component was not found, or was unable to be opened.  This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded).  Note that
Open MPI stopped checking at the first component that it did not find.

Host:      e000.nifl.fysik.dtu.dk
Framework: mtl
Component: psm2
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No components were able to be opened in the pml framework.

This typically means that either no components of this type were
installed, or none of the installed components can be loaded.
Sometimes this means that shared libraries required by these
components are unable to be found/loaded.

   Host:      e000
   Framework: pml
--------------------------------------------------------------------------
[e000.nifl.fysik.dtu.dk:1849661] PML cm cannot be selected
) (at easybuild/framework/easyblock.py:3655 in _sanity_check_step)
== 2023-09-26 08:16:16,111 build_log.py:267 INFO ... (took 5 secs)
== 2023-09-26 08:16:16,111 filetools.py:2012 INFO Removing lock /home/modules/software/.locks/_home_modules_software_OpenMPI_4.1.4-GCC-12.2.0.lock... == 2023-09-26 08:16:16,112 filetools.py:383 INFO Path /home/modules/software/.locks/_home_modules_software_OpenMPI_4.1.4-GCC-12.2.0.lock successfully removed. == 2023-09-26 08:16:16,112 filetools.py:2016 INFO Lock removed: /home/modules/software/.locks/_home_modules_software_OpenMPI_4.1.4-GCC-12.2.0.lock == 2023-09-26 08:16:16,112 easyblock.py:4277 WARNING build failed (first 300 chars): Sanity check failed: sanity check command OMPI_MCA_rmaps_base_oversubscribe=1 mpirun -n 8 /dev/shm/OpenMPI/4.1.4/GCC-12.2.0/mpi_test_hello_c exited with code 1 (output: --------------------------------------------------------------------------
A requested component was not found, or was unable to be
== 2023-09-26 08:16:16,112 easyblock.py:328 INFO Closing log for application name OpenMPI version 4.1.4


Note: This node is NOT equipped with Infiniband or Omni-Path, just plain Ethernet.

By default OpenMPI is being configured with "--with-verbs", you should see that popping up in the log file (or use "eb --trace" to get some more info during the installation).

If you don't have Infiniband, you should add --without-verbs via configopts in your OpenMPI easyconfig file (which should prevent the OpenMPI easyblock from adding --with-verbs), or using a hook (see for example https://docs.easybuild.io/hooks/#replace-with-verbs-with-without-verbs-in-openmpi-configure-options, although that exact example won't work, you should just hard inject --without-verbs in self.cfg['configopts'] instead in the pre_configure_hook).





regards,

Kenneth

Reply via email to