Hi Jakob,

Jakob Schiotz <[email protected]> writes:

> At our cluster, the different nodes mount the /home/modules folder
> from different file systems, so they each contain the modules for the
> appropriate architecture.  The advantage is, that if a script or
> something else (like a Python venv) contains a path to a file in a
> module, it will always be to an executable built for the right
> architecture.

So under /home/modules you also have the software?  Or do you have whole
software tree mounted separately and the architecture-specific modules
just contain paths which point to the right part of the software tree? 

> If the folder names are different, then I do not think that you can
> build a venv on one architecture, and use it on another.  And that is
> often very important for our production runs, where different jobs as
> part of the same project are submitted to different types of nodes.

I don't understand the point about different folder name.  I was
thinking that /sw/sc/easybuild/{generic,optimized} would point to the
a subdirectory of the total EasyBuild tree which contains the modules
and software for the given node. 

> https://wiki.fysik.dtu.dk/Niflheim_system/EasyBuild_modules/#setting-the-cpu-hardware-architecture

Because the file system which contains, amongst other things, EasyBuild,
is already mounted, I was planning to use links rather than something
like autofs.   

Cheers,

Loris

> Best regards
>
> Jakob
>
>
>
>> On 3 Nov 2023, at 11.08, Loris Bennett <[email protected]> wrote:
>> 
>> Hi,
>> 
>> We need to manage an heterogeneous cluster and I am looking at how to
>> organise building the software in this context.  My current idea is the
>> following:
>> 
>>  1. Software is created within the following directory tree
>> 
>>  /nfs/easybuild/arch/x68_64/amd
>>                         .../amd/zen3
>>  /nfs/easybuild/arch/x68_64/amd
>>                         .../intel
>>                         .../intel/cascadelake
>>                         .../intel/skylake_avx512
>>                         .../generic 
>> 
>>  The paths below 'arch' correspond to those produced by
>> 
>>  
>> https://github.com/EESSI/software-layer/blob/2023.06/eessi_software_subdir.py
>> 
>>  2. When each node is booted, a systemd service creates the following
>>  directory
>> 
>>  /sw/sc/easybuild
>> 
>>  and in that the following links 
>> 
>>  generic -> /nfs/easybuild/arch/x86_64/generic
>>  optimized -> /nfs/easybuild/arch/x86_64/intel/skylake_avx512
>> 
>>  3. Binary only software is installed via an administration node by
>>  running EasyBuild with
>> 
>>  --prefix=/sw/sc/generic
>> 
>>  Software optimized for a specific architecture is built by sending a
>>  job via Slurm to a node with the architecture needed and using
>> 
>>  --prefix=/sw/sc/optimized
>> 
>> Does this sound plausible?  Have I overlooked anything?
>> 
>> One thing I am not quite clear on is the following:
>> 
>> What would be the best way to determine whether an EC specifies a binary
>> EasyBlock or whether the toolchain is 'SYSTEM' and thus the software
>> should be built in 'generic'?  Or would it be better to say disk space
>> is cheap and just install binary and 'SYSTEM' packages for each
>> architecture in order to simplify things?
>> 
>> Any help/comments much appreciated.
>> 
>> Cheers,
>> 
>> Loris
>> 
>> -- 
>> Dr. Loris Bennett (Herr/Mr)
>> ZEDAT, Freie Universität Berlin
-- 
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin

Reply via email to