Hmm, this has now come up a few times. OpenMPI does not like hyperthreading and only cares about the physical cores. EB is passing the number of cores it sees as the number of required slots. Without oversubscription the example will not run. Either we allow oversubscription, or we figure out a method to quantify the hyperthreading.
There are a few open issues on this, see https://github.com/easybuilders/easybuild-easyblocks/pull/2611 and the linked issues. For an immediate fix, you just need to limit the number of cores used for the build, e.g. use the eb option `--parallel=12` On Mon, 15 Nov 2021 at 09:06, Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk<mailto:ole.h.niel...@fysik.dtu.dk>> wrote: We use EB 4.5.0 and would like to install this module: $ eb ORCA-5.0.1-gompi-2021a.eb -r but it fails with: == FAILED: Installation ended unsuccessfully (build directory: /dev/shm/ORCA/5.0.1/gompi-2021a): build failed (first 300 chars): Sanity check failed: sanity check command $EBROOTORCA/bin/orca /dev/shm/ORCA/5.0.1/gompi-2021a/eb_test_hf_water.inp | grep -c 'FINAL SINGLE POINT ENERGY[ ]*-75.95934031' exited with code 1 (output: -------------------------------------------------------------------------- There are not enough slots (took 1 min 50 secs) == Results of the build can be found in the log file(s) /tmp/eb-2QJPW_/easybuild-ORCA-5.0.1-20211111.140110.qlMvK.log ERROR: Build of /home/modules/software/EasyBuild/4.5.0/easybuild/easyconfigs/o/ORCA/ORCA-5.0.1-gompi-2021a.eb failed (err: "build failed (first 300 chars): Sanity check failed: sanity check command $EBROOTORCA/bin/orca /dev/shm/ORCA/5.0.1/gompi-2021a/eb_test_hf_water.inp | grep -c 'FINAL SINGLE POINT ENERGY[ \t]*-75.95934031' exited with code 1 (output: --------------------------------------------------------------------------\nThere are not enough slots") There are further errors in the logfile: == 2021-11-11 14:03:00,669 build_log.py:169 ERROR EasyBuild crashed with an error (at easybuild/base/exceptions.py:124 in __init__): Sanity check failed: sanity check command $EBROOTORCA/bin/orca /dev/shm/ORCA/5.0.1/gompi-2021a/eb_test_hf_water.inp | grep -c 'FINAL SINGLE POINT ENERGY[ ]*-75.95934031' exited with code 1 (output: -------------------------------------------------------------------------- There are not enough slots available in the system to satisfy the 48 slots that were requested by the application: /home/modules/software/ORCA/5.0.1-gompi-2021a/bin/orca_gtoint_mpi Either request fewer slots for your application, or make more slots available for use. A "slot" is the Open MPI term for an allocatable unit where we can launch a process. The number of slots available are defined by the environment in which Open MPI processes are run: 1. Hostfile, via "slots=N" clauses (N defaults to number of processor cores if not provided) 2. The --host command line parameter, via a ":N" suffix on the hostname (N defaults to 1 if not provided) 3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.) 4. If none of a hostfile, the --host command line parameter, or an RM is present, Open MPI defaults to the number of processor cores In all the above cases, if you want Open MPI to default to the number of hardware threads instead of the number of processor cores, use the --use-hwthread-cpus option. Alternatively, you can use the --oversubscribe option to ignore the number of available slots when deciding the number of processes to launch. -------------------------------------------------------------------------- [file orca_tools/qcmsg.cpp, line 458]: .... aborting the run 0 ) (at easybuild/framework/easyblock.py:3311 in _sanity_check_step) Question: Why does the ORCA testing request 48 MPI "slots" (MPI tasks I suppose) and then fails? The build host has two Intel(R) Xeon(R) CPU E5-2650 v4 processors for 48 cores (including Hyperthreading). The ORCA input file /dev/shm/ORCA/5.0.1/gompi-2021a/eb_test_hf_water.inp contains: !HF DEF2-SVP %PAL NPROCS 48 END * xyz 0 1 O 0.0000 0.0000 0.0626 H -0.7920 0.0000 -0.4973 H 0.7920 0.0000 -0.4973 * The user's limits would seem to be sufficient: $ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) 50000000 scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 1030498 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) 50000000 open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) unlimited cpu time (seconds, -t) 240000 max user processes (-u) 2500 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited Thanks for sharing any insights. -- Ole Holm Nielsen PhD, Senior HPC Officer Department of Physics, Technical University of Denmark -- Dr. Alan O'Cais E-CAM Software Manager Juelich Supercomputing Centre Forschungszentrum Juelich GmbH 52425 Juelich, Germany Phone: +49 2461 61 5213 Fax: +49 2461 61 6656 E-mail: a.oc...@fz-juelich.de<mailto:a.oc...@fz-juelich.de> WWW: http://www.fz-juelich.de/ias/jsc/EN ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Volker Rieke Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr. Astrid Lambrecht, Prof. Dr. Frauke Melchior ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------