Hi Jakob, Jakob Schiotz <ja...@schiotz.dk> writes:
> Sorry for being late in replying, the spam filter held your mail. No problem. >> On 6 Nov 2023, at 15.21, Loris Bennett <loris.benn...@fu-berlin.de> wrote: >> >> Hi Jakob, >> >> Jakob Schiotz <ja...@schiotz.dk> writes: >> >>> At our cluster, the different nodes mount the /home/modules folder >>> from different file systems, so they each contain the modules for the >>> appropriate architecture. The advantage is, that if a script or >>> something else (like a Python venv) contains a path to a file in a >>> module, it will always be to an executable built for the right >>> architecture. >> >> So under /home/modules you also have the software? Or do you have whole >> software tree mounted separately and the architecture-specific modules >> just contain paths which point to the right part of the software tree? > > /home/modules contains everything built by EasyBuild. Yes, there is > some unnecessary duplication, but it is easier not to have to know > which modules require to be architecture specific, and just build > everything for four architectures. I am also coming round to the idea that accepting a certain amount of duplication is the pragmatic way to go. >>> If the folder names are different, then I do not think that you can >>> build a venv on one architecture, and use it on another. And that is >>> often very important for our production runs, where different jobs as >>> part of the same project are submitted to different types of nodes. >> >> I don't understand the point about different folder name. I was >> thinking that /sw/sc/easybuild/{generic,optimized} would point to the >> a subdirectory of the total EasyBuild tree which contains the modules >> and software for the given node. > > Yes, if you have some link magic that gives you a path that is always > the same, regardless of architecture, then things will work. What > will not work is if the path to the Python executable is different on > different nodes, then virtual environments break. The software would always be built on a compute node and where, say, /sw/sc/easybuild/software/Python/3.11.3-GCCcore-12.3.0/bin which would actually point to something like, say, /nfs/easybuild/arch/x86_64/intel/skylake_avx512/software/Python/3.11.3-GCCcore-12.3.0/bin one node. On another it might point to /nfs/easybuild/arch/x86_64/amd/zen3/software/Python/3.11.3-GCCcore-12.3.0/bin The modules would always contain the '/sw/sc/easybuild' path, so I would hope that any venvs would be OK. >>> https://wiki.fysik.dtu.dk/Niflheim_system/EasyBuild_modules/#setting-the-cpu-hardware-architecture >> >> Because the file system which contains, amongst other things, EasyBuild, >> is already mounted, I was planning to use links rather than something >> like autofs. > > I am sure you can do it that way, too. > > Best regards > > Jakob Thanks for the information. Cheers, Loris > >> >> Cheers, >> >> Loris >> >>> Best regards >>> >>> Jakob >>> >>> >>> >>>> On 3 Nov 2023, at 11.08, Loris Bennett <loris.benn...@fu-berlin.de> wrote: >>>> >>>> Hi, >>>> >>>> We need to manage an heterogeneous cluster and I am looking at how to >>>> organise building the software in this context. My current idea is the >>>> following: >>>> >>>> 1. Software is created within the following directory tree >>>> >>>> /nfs/easybuild/arch/x68_64/amd >>>> .../amd/zen3 >>>> /nfs/easybuild/arch/x68_64/amd >>>> .../intel >>>> .../intel/cascadelake >>>> .../intel/skylake_avx512 >>>> .../generic >>>> >>>> The paths below 'arch' correspond to those produced by >>>> >>>> https://github.com/EESSI/software-layer/blob/2023.06/eessi_software_subdir.py >>>> >>>> 2. When each node is booted, a systemd service creates the following >>>> directory >>>> >>>> /sw/sc/easybuild >>>> >>>> and in that the following links >>>> >>>> generic -> /nfs/easybuild/arch/x86_64/generic >>>> optimized -> /nfs/easybuild/arch/x86_64/intel/skylake_avx512 >>>> >>>> 3. Binary only software is installed via an administration node by >>>> running EasyBuild with >>>> >>>> --prefix=/sw/sc/generic >>>> >>>> Software optimized for a specific architecture is built by sending a >>>> job via Slurm to a node with the architecture needed and using >>>> >>>> --prefix=/sw/sc/optimized >>>> >>>> Does this sound plausible? Have I overlooked anything? >>>> >>>> One thing I am not quite clear on is the following: >>>> >>>> What would be the best way to determine whether an EC specifies a binary >>>> EasyBlock or whether the toolchain is 'SYSTEM' and thus the software >>>> should be built in 'generic'? Or would it be better to say disk space >>>> is cheap and just install binary and 'SYSTEM' packages for each >>>> architecture in order to simplify things? >>>> >>>> Any help/comments much appreciated. >>>> >>>> Cheers, >>>> >>>> Loris >>>> >>>> -- >>>> Dr. Loris Bennett (Herr/Mr) >>>> ZEDAT, Freie Universität Berlin >> -- >> Dr. Loris Bennett (Herr/Mr) >> ZEDAT, Freie Universität Berlin -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin