We are in the process of retiring two long standing LFS's (about
8 years old), which we built and managed ourselves. Both use ZFS
and have the MDT'S on ssd's in a JBOD that require the kind of
software-based management you describe, in our case ZFS pools
built on multipath devices. The MDT in one is ZFS and the MDT in
the other LFS is ldiskfs but uses ZFS and a zvol as you describe
- we build the ldiskfs MDT on top of the zvol. Generally, this
has worked well for us, with one big caveat. If you look for my
posts to this list and the ZFS list you'll find more details.
The short version is that we utilize ZFS snapshots and clones to
do backups of the metadata. We've run into situations where the
backup process stalls, leaving a clone hanging around. We've
experienced a situation a couple of times where the clone and the
primary zvol get swapped, effectively rolling back our metadata
to the point when the clone was created. I have tried,
unsuccessfully, to recreate
that in a test environment. So if you do that kind of setup,
make sure you have good monitoring in place to detect if your
backups/clones stall. We've kept up with lustre and ZFS updates
over the years and are currently on lustre 2.14 and ZFS 2.1.
We've seen the gap between our ZFS MDT and ldiskfs performance
shrink to the point where they are pretty much on par to each
now. I think our ZFS MDT performance could be better with more
hardware and software tuning but our small team hasn't had the
bandwidth to tackle that.
Our newest LFS is vendor provided and uses NVMe MDT's. I'm not at
liberty to talk about the proprietary way those devices are
managed. However, the metadata performance is SO much better
than our older LFS's, for a lot of reasons, but I'd highly
recommend NVMe's for your MDT's.
-----Original Message-----
From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org
<mailto:lustre-discuss-boun...@lists.lustre.org>> on behalf of
Thomas Roth via lustre-discuss <lustre-discuss@lists.lustre.org
<mailto:lustre-discuss@lists.lustre.org>>
Reply-To: Thomas Roth <t.r...@gsi.de <mailto:t.r...@gsi.de>>
Date: Friday, January 5, 2024 at 9:03 AM
To: Lustre Diskussionsliste <lustre-discuss@lists.lustre.org
<mailto:lustre-discuss@lists.lustre.org>>
Subject: [EXTERNAL] [BULK] [lustre-discuss] MDS hardware - NVME?
CAUTION: This email originated from outside of NASA. Please take
care when clicking links or opening attachments. Use the "Report
Message" button to report suspicious messages to the NASA SOC.
Dear all,
considering NVME storage for the next MDS.
As I understand, NVME disks are bundled in software, not by a
hardware raid controller.
This would be done using Linux software raid, mdadm, correct?
We have some experience with ZFS, which we use on our OSTs.
But I would like to stick to ldiskfs for the MDTs, and a zpool
with a zvol on top which is then formatted with ldiskfs - to much
voodoo...
How is this handled elsewhere? Any experiences?
The available devices are quite large. If I create a raid-10 out
of 4 disks, e.g. 7 TB each, my MDT will be 14 TB - already close
to the 16 TB limit.
So no need for a box with lots of U.3 slots.
But for MDS operations, we will still need a powerful dual-CPU
system with lots of RAM.
Then the NVME devices should be distributed between the CPUs?
Is there a way to pinpoint this in a call for tender?
Best regards,
Thomas
--------------------------------------------------------------------
Thomas Roth
GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany,
https://urldefense.us/v3/__http://www.gsi.de/__;!!G2kpM7uM-TzIFchu!1QmOnUbmSPpZPcc39XFZ3S-Vk4Dmh-Q78Gpm8ylYUf6Zhv_zpb2VXkM4C5Uhh05x01MhjqJTYZ5boqzEhkx6JF_rGY74EQ$
<https://urldefense.us/v3/__http://www.gsi.de/__;!!G2kpM7uM-TzIFchu!1QmOnUbmSPpZPcc39XFZ3S-Vk4Dmh-Q78Gpm8ylYUf6Zhv_zpb2VXkM4C5Uhh05x01MhjqJTYZ5boqzEhkx6JF_rGY74EQ$
>
Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB
1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des
GSI-Aufsichtsrats:
State Secretary / Staatssekretär Dr. Volkmar Dietz
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
<mailto:lustre-discuss@lists.lustre.org>
https://urldefense.us/v3/__http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!G2kpM7uM-TzIFchu!1QmOnUbmSPpZPcc39XFZ3S-Vk4Dmh-Q78Gpm8ylYUf6Zhv_zpb2VXkM4C5Uhh05x01MhjqJTYZ5boqzEhkx6JF9_AFR58A$
<https://urldefense.us/v3/__http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!G2kpM7uM-TzIFchu!1QmOnUbmSPpZPcc39XFZ3S-Vk4Dmh-Q78Gpm8ylYUf6Zhv_zpb2VXkM4C5Uhh05x01MhjqJTYZ5boqzEhkx6JF9_AFR58A$
>
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
https://urldefense.us/v3/__http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!G2kpM7uM-TzIFchu!1QmOnUbmSPpZPcc39XFZ3S-Vk4Dmh-Q78Gpm8ylYUf6Zhv_zpb2VXkM4C5Uhh05x01MhjqJTYZ5boqzEhkx6JF9_AFR58A$