Hmm, this has now come up a few times. OpenMPI does not like hyperthreading and 
only cares about the physical cores. EB is passing the number of cores it sees 
as the number of required slots. Without oversubscription the example will not 
run. Either we allow oversubscription, or we figure out a method to quantify 
the hyperthreading.

There are a few open issues on this, see 
https://github.com/easybuilders/easybuild-easyblocks/pull/2611 and the linked 
issues.

For an immediate fix, you just need to limit the number of cores used for the 
build, e.g. use the eb option `--parallel=12`


On Mon, 15 Nov 2021 at 09:06, Ole Holm Nielsen 
<ole.h.niel...@fysik.dtu.dk<mailto:ole.h.niel...@fysik.dtu.dk>> wrote:
We use EB 4.5.0 and would like to install this module:

$ eb ORCA-5.0.1-gompi-2021a.eb -r

but it fails with:

== FAILED: Installation ended unsuccessfully (build directory:
/dev/shm/ORCA/5.0.1/gompi-2021a): build failed (first 300 chars): Sanity
check failed: sanity check command $EBROOTORCA/bin/orca
/dev/shm/ORCA/5.0.1/gompi-2021a/eb_test_hf_water.inp | grep -c 'FINAL
SINGLE POINT ENERGY[    ]*-75.95934031' exited with code 1 (output:
--------------------------------------------------------------------------
There are not enough slots (took 1 min 50 secs)
== Results of the build can be found in the log file(s)
/tmp/eb-2QJPW_/easybuild-ORCA-5.0.1-20211111.140110.qlMvK.log
ERROR: Build of
/home/modules/software/EasyBuild/4.5.0/easybuild/easyconfigs/o/ORCA/ORCA-5.0.1-gompi-2021a.eb
failed (err: "build failed (first 300 chars): Sanity check failed: sanity
check command $EBROOTORCA/bin/orca
/dev/shm/ORCA/5.0.1/gompi-2021a/eb_test_hf_water.inp | grep -c 'FINAL
SINGLE POINT ENERGY[ \t]*-75.95934031' exited with code 1 (output:
--------------------------------------------------------------------------\nThere
are not enough slots")


There are further errors in the logfile:

== 2021-11-11 14:03:00,669 build_log.py:169 ERROR EasyBuild crashed with
an error (at easybuild/base/exceptions.py:124 in __init__): Sanity check
failed: sanity check command $EBROOTORCA/bin/orca
/dev/shm/ORCA/5.0.1/gompi-2021a/eb_test_hf_water.inp | grep -c 'FINAL
SINGLE POINT ENERGY[  ]*-75.95934031' exited with code 1 (output:
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 48
slots that were requested by the application:

   /home/modules/software/ORCA/5.0.1-gompi-2021a/bin/orca_gtoint_mpi

Either request fewer slots for your application, or make more slots
available for use.

A "slot" is the Open MPI term for an allocatable unit where we can
launch a process.  The number of slots available are defined by the
environment in which Open MPI processes are run:

   1. Hostfile, via "slots=N" clauses (N defaults to number of
      processor cores if not provided)
   2. The --host command line parameter, via a ":N" suffix on the
      hostname (N defaults to 1 if not provided)
   3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
   4. If none of a hostfile, the --host command line parameter, or an
      RM is present, Open MPI defaults to the number of processor cores

In all the above cases, if you want Open MPI to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.

Alternatively, you can use the --oversubscribe option to ignore the
number of available slots when deciding the number of processes to
launch.
--------------------------------------------------------------------------
[file orca_tools/qcmsg.cpp, line 458]:
   .... aborting the run

0
) (at easybuild/framework/easyblock.py:3311 in _sanity_check_step)


Question: Why does the ORCA testing request 48 MPI "slots" (MPI tasks I
suppose) and then fails?

The build host has two Intel(R) Xeon(R) CPU E5-2650 v4 processors for 48
cores (including Hyperthreading).

The ORCA input file /dev/shm/ORCA/5.0.1/gompi-2021a/eb_test_hf_water.inp
contains:

!HF DEF2-SVP
%PAL NPROCS 48 END
* xyz 0 1
O   0.0000   0.0000   0.0626
H  -0.7920   0.0000  -0.4973
H   0.7920   0.0000  -0.4973
*

The user's limits would seem to be sufficient:

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) 50000000
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 1030498
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) 50000000
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) 240000
max user processes              (-u) 2500
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited


Thanks for sharing any insights.

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark


--
Dr. Alan O'Cais
E-CAM Software Manager
Juelich Supercomputing Centre
Forschungszentrum Juelich GmbH
52425 Juelich, Germany

Phone: +49 2461 61 5213
Fax: +49 2461 61 6656
E-mail: a.oc...@fz-juelich.de<mailto:a.oc...@fz-juelich.de>
WWW:    http://www.fz-juelich.de/ias/jsc/EN


------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr. Astrid Lambrecht,
Prof. Dr. Frauke Melchior
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------

Reply via email to