Hi,
We need to manage an heterogeneous cluster and I am looking at how to
organise building the software in this context. My current idea is the
following:
1. Software is created within the following directory tree
/nfs/easybuild/arch/x68_64/amd
.../amd/zen3
/nfs/easybuild/arch/x68_64/amd
.../intel
.../intel/cascadelake
.../intel/skylake_avx512
.../generic
The paths below 'arch' correspond to those produced by
https://github.com/EESSI/software-layer/blob/2023.06/eessi_software_subdir.py
2. When each node is booted, a systemd service creates the following
directory
/sw/sc/easybuild
and in that the following links
generic -> /nfs/easybuild/arch/x86_64/generic
optimized -> /nfs/easybuild/arch/x86_64/intel/skylake_avx512
3. Binary only software is installed via an administration node by
running EasyBuild with
--prefix=/sw/sc/generic
Software optimized for a specific architecture is built by sending a
job via Slurm to a node with the architecture needed and using
--prefix=/sw/sc/optimized
Does this sound plausible? Have I overlooked anything?
One thing I am not quite clear on is the following:
What would be the best way to determine whether an EC specifies a binary
EasyBlock or whether the toolchain is 'SYSTEM' and thus the software
should be built in 'generic'? Or would it be better to say disk space
is cheap and just install binary and 'SYSTEM' packages for each
architecture in order to simplify things?
Any help/comments much appreciated.
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin