Hi Jakob,
Jakob Schiotz <[email protected]> writes:
> At our cluster, the different nodes mount the /home/modules folder
> from different file systems, so they each contain the modules for the
> appropriate architecture. The advantage is, that if a script or
> something else (like a Python venv) contains a path to a file in a
> module, it will always be to an executable built for the right
> architecture.
So under /home/modules you also have the software? Or do you have whole
software tree mounted separately and the architecture-specific modules
just contain paths which point to the right part of the software tree?
> If the folder names are different, then I do not think that you can
> build a venv on one architecture, and use it on another. And that is
> often very important for our production runs, where different jobs as
> part of the same project are submitted to different types of nodes.
I don't understand the point about different folder name. I was
thinking that /sw/sc/easybuild/{generic,optimized} would point to the
a subdirectory of the total EasyBuild tree which contains the modules
and software for the given node.
> https://wiki.fysik.dtu.dk/Niflheim_system/EasyBuild_modules/#setting-the-cpu-hardware-architecture
Because the file system which contains, amongst other things, EasyBuild,
is already mounted, I was planning to use links rather than something
like autofs.
Cheers,
Loris
> Best regards
>
> Jakob
>
>
>
>> On 3 Nov 2023, at 11.08, Loris Bennett <[email protected]> wrote:
>>
>> Hi,
>>
>> We need to manage an heterogeneous cluster and I am looking at how to
>> organise building the software in this context. My current idea is the
>> following:
>>
>> 1. Software is created within the following directory tree
>>
>> /nfs/easybuild/arch/x68_64/amd
>> .../amd/zen3
>> /nfs/easybuild/arch/x68_64/amd
>> .../intel
>> .../intel/cascadelake
>> .../intel/skylake_avx512
>> .../generic
>>
>> The paths below 'arch' correspond to those produced by
>>
>>
>> https://github.com/EESSI/software-layer/blob/2023.06/eessi_software_subdir.py
>>
>> 2. When each node is booted, a systemd service creates the following
>> directory
>>
>> /sw/sc/easybuild
>>
>> and in that the following links
>>
>> generic -> /nfs/easybuild/arch/x86_64/generic
>> optimized -> /nfs/easybuild/arch/x86_64/intel/skylake_avx512
>>
>> 3. Binary only software is installed via an administration node by
>> running EasyBuild with
>>
>> --prefix=/sw/sc/generic
>>
>> Software optimized for a specific architecture is built by sending a
>> job via Slurm to a node with the architecture needed and using
>>
>> --prefix=/sw/sc/optimized
>>
>> Does this sound plausible? Have I overlooked anything?
>>
>> One thing I am not quite clear on is the following:
>>
>> What would be the best way to determine whether an EC specifies a binary
>> EasyBlock or whether the toolchain is 'SYSTEM' and thus the software
>> should be built in 'generic'? Or would it be better to say disk space
>> is cheap and just install binary and 'SYSTEM' packages for each
>> architecture in order to simplify things?
>>
>> Any help/comments much appreciated.
>>
>> Cheers,
>>
>> Loris
>>
>> --
>> Dr. Loris Bennett (Herr/Mr)
>> ZEDAT, Freie Universität Berlin
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin