Hello Carlos,

Your hardware is interesting. It is more powerful than ours.


Last year in May we performed an upgrade from CentOS 7.10 with Lustre 2.12.4 to 
Lustre 2.15.4 with RHEL 8.9 (for the Lustre file server) and RHEL 9.3 (for the 
head and compute nodes).


It was a big update. We were very nervous.


We did spend a lot of time to prepare this general update (almost everything 
including firmware was updated) as we had no auxiliary system to "practice" 
(excepts some VMs). We spend a lot of time to script the installation process 
completely, from the installation .iso with kickstart to the node in its final 
state (in 3 flavors: file server, head node or compute node) across multiple 
reboot steps (in 30 min) and possibly in parallel in addition to using compiled 
Lustre and MOFED RPMs at every step and developing a repository system where 
the custom Lustre or MOFED RPMs can hide the corresponding RPMs of the 
distribution while allowing  updates for non kernel, Lustre or MOFED RPMs on a 
weekly basis. All of this worth the effort. The update this year using these 
improved mechanisms was way faster and smoother. I believe that compiling 
Lustre and choosing which git commit to use also worth the additional effort as 
it improve compatibility.


I am interested in your problem. When you find the solution, please publish as 
it can help the community.


Thanks,


Martin Audet

________________________________
From: Carlos Adean <[email protected]>
Sent: April 28, 2025 4:15 PM
To: Audet, Martin
Cc: [email protected]; Eloir Troyack
Subject: EXT: Re: Re: [lustre-discuss] Installing lustre 2.15.6 server on 
rhel-8.10 fails

***Attention*** This email originated from outside of the NRC. ***Attention*** 
Ce courriel provient de l'extérieur du CNRC.

Hi Martin,

I really appreciate the help.

My answers are inline below.


One question: are you using the precompiled Lustre RPMs (e.g. those available 
from: https://downloads.whamcloud.com/public/lustre/lustre-2.15.6/ ) or are you 
compiling your own RPMs from the Lustre git repository ( 
https://github.com/lustre/lustre-release ) ?

In our case we use the second approach and I think it is better for two reasons:


1- You make sure that everything is consistent, especially with your MOFED 
environment

2- You are not forced to use the specific versions corresponding to tags 
exactly, you can chose any version available in git repository or cherry-pick 
the fixes you think are useful (more details on this later).

Precompiled RPMs.


In our case we upgraded last week a small HPC cluster using RHEL 8 for the file 
server and RHEL 9 for the clients. The update was successful and we had no 
problem related to MOFED, Lustre, PMIx, Slurm, MPI (including MPI-IO) up to now.

Your upgrade scenario is similar to ours. We’re upgrading our servers from RHEL 
7 with Lustre 2.12.6 to RHEL 8.10 with Lustre 2.15.x. The clients previously 
ran RHEL 7 and will now run RHEL 9.5.


Our upgrade is described in a message posted on this mailing list on April 7th:


http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2025-April/019471.html

Actually, our Lustre environment is a bit complex. It has approximately 570 TB 
of capacity, organized into two tiers: T0 (70 TB) and T1 (500 TB).

Its infrastructure is composed of two MDS servers connected to a Dell ME4024 
storage array, and four OSS servers. Two of these OSS nodes are equipped with 
NVMe SSDs and provide the T0 tier (high-performance scratch space), while the 
other two OSS nodes are connected via SAS to two ME4084 storage arrays, 
supporting the T1 tier (long-term data). The entire system operates with high 
availability (HA) and load balancing (LB) mechanisms.


Cheers,

---
Carlos Adean
www.linea.org.br<https://www.linea.org.br>

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to