Great to read that this issue was solved.

The procedure given by Martin is really comprehensive. Helps a lot if you have 
all the steps listed, thanks!


My recent Lustre installs went a slightly shorter way, mostly because we are 
content with the kernel and ofed from the distro, using the dkms packages 
instead of the precompiled modules.
This procedure must come from this mailing list or perhaps some Lustre-Wiki 
article, but unfortunately I don't remember who to give credit.

So, according to my notes starting with Rocky 8.8, I

- installed the system, Rocky 8.x, with Ofed and all
- installed the kernel-devel package (creates "/usr/src/kernels/4.18...." etc.)
- got the ext4 sources by installing the kernel source rpm, in the case of Rocky 8.8 that 
was  "kernel-4.18.0-477.15.1.el8_8.src.rpm"
- unpack and copy the ext4 sources
"tar xJf ~/rpmbuild/SOURCES/linux-4.18.0-477.15.1.el8_8.tar.xz"
"cp -a linux-4.18.0-477.15.1.el8_8/fs/ext4/* 
/usr/src/kernels/4.18.0-477.15.1.el8_8.x86_64/fs/ext4/" - complains about 2 existing 
files, which are identical however
- get all thinkable development stuff, python-whatnot, libnl,...
- install only the dkms package at first
"dnf install lustre-ldiskfs-dkms"
- and if that was successful, install "lustre" and  "lustre-osd-ldiskfs-mount"

The weird step of manually copying around the ext4 sources is only necessary if 
you run ldiskfs, of course.
We do ZFS on our OSSes, where "dnf install lustre-zfs-dkms", then "dnf install 
lustre lustre-osd-zfs-mount" is all we need.

Cheers,
Thomas


On 5/5/25 14:54, Eloir Troyack via lustre-discuss wrote:
Hi Martin,

I'm resending this message because I wasn't subscribed to the list, and it's 
important to share this feedback with the community.

I'm working with Carlos on this Lustre upgrade, and I've been directly involved 
in the installation and troubleshooting process.

The issue turned out to be the absence of the |kernel-modules| package that 
corresponds to the modified Lustre kernel. It appears this package is required 
for the disk controller and InfiniBand drivers to function properly.

After installing the package (available from the Whamcloud repository), we no 
longer saw any warnings during installation or encountered any filesystem 
issues at boot.

Even though the |kernel-modules|package appears to be considered optional, it 
ended up being essential for us and I believe it would be essential in most 
cases where someone installs Lustre through the pre-compiled packages.

We're currently running performance tests to determine whether we'll need to 
build and install the MOFED drivers to get the most out of the InfiniBand 
network.

Thanks a lot for your input and interest in our issue — it was really helpful!

Best regards,

Eloir Troyack


Em seg., 28 de abr. de 2025 às 18:43, Audet, Martin <[email protected] 
<mailto:[email protected]>> escreveu:

    Hello Carlos,


    Your hardware is interesting. It is more powerful than ours.


    Last year in May we performed an upgrade from CentOS 7.10 with Lustre 
2.12.4 to Lustre 2.15.4 with RHEL 8.9 (for the Lustre file server) and RHEL 9.3 
(for the head and compute nodes).


    It was a big update. We were very nervous.


    We did spend a lot of time to prepare this general update (almost everything 
including firmware was updated) as we had no auxiliary system to "practice" 
(excepts some VMs). We spend a lot of time to script the installation process completely, 
from the installation .iso with kickstart to the node in its final state (in 3 flavors: 
file server, head node or compute node) across multiple reboot steps (in 30 min) and 
possibly in parallel in addition to using compiled Lustre and MOFED RPMs at every step 
and developing a repository system where the custom Lustre or MOFED RPMs can hide the 
corresponding RPMs of the distribution while allowing  updates for non kernel, Lustre or 
MOFED RPMs on a weekly basis. All of this worth the effort. The update this year using 
these improved mechanisms was way faster and smoother. I believe that compiling Lustre 
and choosing which git commit to use also worth the additional effort as it improve 
compatibility.


    I am interested in your problem. When you find the solution, please publish 
as it can help the community.


    Thanks,


    Martin Audet

    
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    *From:* Carlos Adean <[email protected] 
<mailto:[email protected]>>
    *Sent:* April 28, 2025 4:15 PM
    *To:* Audet, Martin
    *Cc:* [email protected] 
<mailto:[email protected]>; Eloir Troyack
    *Subject:* EXT: Re: Re: [lustre-discuss] Installing lustre 2.15.6 server on 
rhel-8.10 fails
    ***Attention*** This email originated from outside of the NRC. 
***Attention*** Ce courriel provient de l'extérieur du CNRC.

    Hi Martin,

    I really appreciate the help.

    My answers are inline below.

        One question: are you using the precompiled Lustre RPMs (e.g. those available from: 
https://downloads.whamcloud.com/public/lustre/lustre-2.15.6/ 
<https://downloads.whamcloud.com/public/lustre/lustre-2.15.6/> ) or are you compiling 
your own RPMs from the Lustre git repository ( https://github.com/lustre/lustre-release 
<https://github.com/lustre/lustre-release> ) ?

        In our case we use the second approach and I think it is better for two 
reasons:

            1- You make sure that everything is consistent, especially with 
your MOFED environment

            2- You are not forced to use the specific versions corresponding to 
tags exactly, you can chose any version available in git repository or 
cherry-pick the fixes you think are useful (more details on this later).

    /
    /
    Precompiled RPMs.

        In our case we upgraded last week a small HPC cluster using RHEL 8 for 
the file server and RHEL 9 for the clients. The update was successful and we 
had no problem related to MOFED, Lustre, PMIx, Slurm, MPI (including MPI-IO) up 
to now.


    Your upgrade scenario is similar to ours. We’re upgrading our servers from 
RHEL 7 with Lustre 2.12.6 to RHEL 8.10 with Lustre 2.15.x. The clients 
previously ran RHEL 7 and will now run RHEL 9.5.

        Our upgrade is described in a message posted on this mailing list on 
April 7th:


            
http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2025-April/019471.html 
<http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2025-April/019471.html>


    Actually, our Lustre environment is a bit complex. It has approximately 570 
TB of capacity, organized into two tiers: T0 (70 TB) and T1 (500 TB).

    Its infrastructure is composed of two MDS servers connected to a Dell 
ME4024 storage array, and four OSS servers. Two of these OSS nodes are equipped 
with NVMe SSDs and provide the T0 tier (high-performance scratch space), while 
the other two OSS nodes are connected via SAS to two ME4084 storage arrays, 
supporting the T1 tier (long-term data). The entire system operates with high 
availability (HA) and load balancing (LB) mechanisms.


    Cheers,

    ---
    /*Carlos Adean*/
    www.linea.org.br <https://www.linea.org.br>



--
Eloir G. S. Troyack
Service Desk - LIneA
www.linea.org.br <https://www.linea.org.br>

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

--
--------------------------------------------------------------------
Thomas Roth
Department: Informationstechnologie


GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Prof. Dr. Thomas Nilsson, Dr. Katharina Stummeyer, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
Ministerialdirigent Dr. Volkmar Dietz


_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to