Hello Andreas,

Thanks for your response.

Yes our hardware is a bit old. It was acquired six years ago, in April 2016. 
But we use a quite recent versions of Lustre, RHEL kernel, OS and MOFED. Here 
are some details:

Presently:
Lustre: 2.15.4 compiled from git repository
MOFED: 23.10-2.1.3.1
File server node: RHEL 8.9 kernel 4.18.0-513.24.1 patched for Lustre, , all 
other non-kernel RPMs being updated on a weekly basis with latest RHEL 8.10
Head/Compute nodes: RHEL 9.3 kernel 5.14.0-362.24.1, , all other non-kernel 
RPMs being updated on a weekly basis with latest RHEL 9.5

After a planned update next week:
Lustre: git commit a71369eb9cb0aa89ede41cb01b2cd9cdcd8e9680 (2.15.6 + 3 
patches: LU-18085 llite: use RCU to protect the dentry_data) compiled from git 
repository
MOFED: 24.10-2.1.8.0
File server node: RHEL 8.10 kernel 4.18.0-553.27.1 patched for Lustre, all 
other non-kernel RPMs being updated on a weekly basis with latest RHEL 8.10
Head/Compute nodes: RHEL 9.5 kernel 5.14.0-503.14.1, all other non-kernel RPMs 
being updated on a weekly basis with latest RHEL 9.5+

When we will add the two new OSTs, in two weeks maybe, we plan to have our 
compute and head nodes powered OFF of other reasons. So the race condition is 
absolutely not a potential problem in our case. But thanks for explaining this 
potential race problem.

Now I have another question: it seems that the OSS contact the MGS server to 
announce their OST and the MGS simply accept them. I am a bit surprised to see 
that nothing needs to be done MGS side to restrict which OSS server can offer 
OSTs. I guess it is like that to keep the basic scenario simple. But if we want 
to improve security, is there some mechanism to restrict which server can 
provide an OST ? In our case it is very simple since MGS, MDS and OSTs are all 
running on the same server.

Thanks,

Martin

From: Andreas Dilger <[email protected]>
Sent: April 5, 2025 3:00
To: Audet, Martin <[email protected]>
Cc: [email protected]; Raymond, Stephane 
<[email protected]>
Subject: EXT: Re: [lustre-discuss] Is it that simple to add a pair of new OSTs ?

***Attention*** This email originated from outside of the NRC. ***Attention*** 
Ce courriel provient de l'extérieur du CNRC.

It really is that simple.

You didn't mention what version you are using, but based on the hardware and 
sizes I would assume it is not the latest.

As such, there is a race that if clients are actively creating and writing new 
files at the instant the OSTs are added, those files may be inaccessible on 
some clients for a few seconds until the new OSTs are visible on all clients.

 If the clients accessing the filesystem are quiesced during the initial mount 
then there is no race. Very recent servers and clients have fixed this race.

Cheers, Andreas


On Apr 3, 2025, at 15:12, Audet, Martin via lustre-discuss 
<[email protected]<mailto:[email protected]>> wrote:

Hello Lustre community,

We are operating a small HPC cluster (576 compute cores) using a small Lustre 
parallel filesystem (64 TB) connected by Infiniband EDR network. The Lustre 
filesystem is implemented by a single HPE DL380 Gen10 server acting as MGS, MDS 
and OSS. It has two 32 TB OSTs (HPE MSA 2050). As new space is required, we 
will soon install 160 TB of additional storage implemented two 80 TB OSTs (HPE 
MSA 2060).

We looked in the Lustre documentation (10.2.1.  Scaling the Lustre File System: 
https://doc.lustre.org/lustre_manual.xhtml#idm140220261007664) and made tests 
with small VMs. It appear that in our case adding this new storage would be 
very simple. From what we understand we should do something like this:

# Create mount points for the new OSTs
mkdir /mnt/ost{2,3}

# The MGS is running on the same node as the OSTs
mgs_node="$(sed -n -e 's/^ *- *nid: *//; T; p' < /etc/lnet.conf)"

# Set the devices corresponding to the new OSTs using invariant names
ost2_device=/dev/disk/by-path/...
ost3_device=/dev/disk/by-path/...

# Create the file systems on the new OSTs
mkfs.lustre --fsname=lustrevm --mgsnode=$mgs_node --ost --index=2 $ost2_device
mkfs.lustre --fsname=lustrevm --mgsnode=$mgs_node --ost --index=3 $ost3_device

# Update fstab
cat >> /etc/fstab << _EOF_
$ost2_device /mnt/ost2 lustre defaults,_netdev 0 0
$ost3_device /mnt/ost3 lustre defaults,_netdev 0 0
_EOF_

# Mount the new OSTs
mount /mnt/ost2
mount /mnt/ost3


This appears too simple. Are we missing something ? Will the new files created 
by the clients use the four OSTs with no additional effort ?

Thanks in advance !

Martin Audet
_______________________________________________
lustre-discuss mailing list
[email protected]<mailto:[email protected]>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to