Hi,

We need to manage an heterogeneous cluster and I am looking at how to
organise building the software in this context.  My current idea is the
following:

  1. Software is created within the following directory tree

  /nfs/easybuild/arch/x68_64/amd
                         .../amd/zen3
  /nfs/easybuild/arch/x68_64/amd
                         .../intel
                         .../intel/cascadelake
                         .../intel/skylake_avx512
                         .../generic 

  The paths below 'arch' correspond to those produced by

  https://github.com/EESSI/software-layer/blob/2023.06/eessi_software_subdir.py

  2. When each node is booted, a systemd service creates the following
  directory

  /sw/sc/easybuild

  and in that the following links 

  generic -> /nfs/easybuild/arch/x86_64/generic
  optimized -> /nfs/easybuild/arch/x86_64/intel/skylake_avx512

  3. Binary only software is installed via an administration node by
  running EasyBuild with

  --prefix=/sw/sc/generic

  Software optimized for a specific architecture is built by sending a
  job via Slurm to a node with the architecture needed and using

  --prefix=/sw/sc/optimized

Does this sound plausible?  Have I overlooked anything?

One thing I am not quite clear on is the following:

What would be the best way to determine whether an EC specifies a binary
EasyBlock or whether the toolchain is 'SYSTEM' and thus the software
should be built in 'generic'?  Or would it be better to say disk space
is cheap and just install binary and 'SYSTEM' packages for each
architecture in order to simplify things?

Any help/comments much appreciated.

Cheers,

Loris

-- 
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin

Reply via email to