Hi Jakob,

Jakob Schiotz <ja...@schiotz.dk> writes:

> Sorry for being late in replying, the spam filter held your mail.

No problem.

>> On 6 Nov 2023, at 15.21, Loris Bennett <loris.benn...@fu-berlin.de> wrote:
>> 
>> Hi Jakob,
>> 
>> Jakob Schiotz <ja...@schiotz.dk> writes:
>> 
>>> At our cluster, the different nodes mount the /home/modules folder
>>> from different file systems, so they each contain the modules for the
>>> appropriate architecture.  The advantage is, that if a script or
>>> something else (like a Python venv) contains a path to a file in a
>>> module, it will always be to an executable built for the right
>>> architecture.
>> 
>> So under /home/modules you also have the software?  Or do you have whole
>> software tree mounted separately and the architecture-specific modules
>> just contain paths which point to the right part of the software tree? 
>
> /home/modules contains everything built by EasyBuild.  Yes, there is
> some unnecessary duplication, but it is easier not to have to know
> which modules require to be architecture specific, and just build
> everything for four architectures.

I am also coming round to the idea that accepting a certain amount of
duplication is the pragmatic way to go.
 
>>> If the folder names are different, then I do not think that you can
>>> build a venv on one architecture, and use it on another.  And that is
>>> often very important for our production runs, where different jobs as
>>> part of the same project are submitted to different types of nodes.
>> 
>> I don't understand the point about different folder name.  I was
>> thinking that /sw/sc/easybuild/{generic,optimized} would point to the
>> a subdirectory of the total EasyBuild tree which contains the modules
>> and software for the given node. 
>
> Yes, if you have some link magic that gives you a path that is always
> the same, regardless of architecture, then things will work.  What
> will not work is if the path to the Python executable is different on
> different nodes, then virtual environments break.

The software would always be built on a compute node and where, say,

  /sw/sc/easybuild/software/Python/3.11.3-GCCcore-12.3.0/bin

which would actually point to something like, say,

  
/nfs/easybuild/arch/x86_64/intel/skylake_avx512/software/Python/3.11.3-GCCcore-12.3.0/bin

one node.  On another it might point to

  /nfs/easybuild/arch/x86_64/amd/zen3/software/Python/3.11.3-GCCcore-12.3.0/bin

The modules would always contain the '/sw/sc/easybuild' path, so I would
hope that any venvs would be OK.  

>>> https://wiki.fysik.dtu.dk/Niflheim_system/EasyBuild_modules/#setting-the-cpu-hardware-architecture
>> 
>> Because the file system which contains, amongst other things, EasyBuild,
>> is already mounted, I was planning to use links rather than something
>> like autofs.   
>
> I am sure you can do it that way, too.
>
> Best regards
>
> Jakob

Thanks for the information.

Cheers,

Loris

>
>> 
>> Cheers,
>> 
>> Loris
>> 
>>> Best regards
>>> 
>>> Jakob
>>> 
>>> 
>>> 
>>>> On 3 Nov 2023, at 11.08, Loris Bennett <loris.benn...@fu-berlin.de> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> We need to manage an heterogeneous cluster and I am looking at how to
>>>> organise building the software in this context.  My current idea is the
>>>> following:
>>>> 
>>>> 1. Software is created within the following directory tree
>>>> 
>>>> /nfs/easybuild/arch/x68_64/amd
>>>>                        .../amd/zen3
>>>> /nfs/easybuild/arch/x68_64/amd
>>>>                        .../intel
>>>>                        .../intel/cascadelake
>>>>                        .../intel/skylake_avx512
>>>>                        .../generic 
>>>> 
>>>> The paths below 'arch' correspond to those produced by
>>>> 
>>>> https://github.com/EESSI/software-layer/blob/2023.06/eessi_software_subdir.py
>>>> 
>>>> 2. When each node is booted, a systemd service creates the following
>>>> directory
>>>> 
>>>> /sw/sc/easybuild
>>>> 
>>>> and in that the following links 
>>>> 
>>>> generic -> /nfs/easybuild/arch/x86_64/generic
>>>> optimized -> /nfs/easybuild/arch/x86_64/intel/skylake_avx512
>>>> 
>>>> 3. Binary only software is installed via an administration node by
>>>> running EasyBuild with
>>>> 
>>>> --prefix=/sw/sc/generic
>>>> 
>>>> Software optimized for a specific architecture is built by sending a
>>>> job via Slurm to a node with the architecture needed and using
>>>> 
>>>> --prefix=/sw/sc/optimized
>>>> 
>>>> Does this sound plausible?  Have I overlooked anything?
>>>> 
>>>> One thing I am not quite clear on is the following:
>>>> 
>>>> What would be the best way to determine whether an EC specifies a binary
>>>> EasyBlock or whether the toolchain is 'SYSTEM' and thus the software
>>>> should be built in 'generic'?  Or would it be better to say disk space
>>>> is cheap and just install binary and 'SYSTEM' packages for each
>>>> architecture in order to simplify things?
>>>> 
>>>> Any help/comments much appreciated.
>>>> 
>>>> Cheers,
>>>> 
>>>> Loris
>>>> 
>>>> -- 
>>>> Dr. Loris Bennett (Herr/Mr)
>>>> ZEDAT, Freie Universität Berlin
>> -- 
>> Dr. Loris Bennett (Herr/Mr)
>> ZEDAT, Freie Universität Berlin
-- 
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin

Reply via email to