Jon

Have you tried building 2.15.3 against Rocky 8.6 locally? While it is true that 
Whamcloud focuses testing on the last RHEL/Rocky minor release, many sites 
still use the same Lustre software with down-rev OS versions….

Peter

From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on behalf of Jon 
Marshall via lustre-discuss <lustre-discuss@lists.lustre.org>
Reply-To: Jon Marshall <jon.marsh...@cruk.cam.ac.uk>
Date: Tuesday, July 11, 2023 at 8:38 AM
To: "lustre-discuss@lists.lustre.org" <lustre-discuss@lists.lustre.org>
Subject: [lustre-discuss] Lustre 2.15.1 server with ZFS nothing provides ksym

Hi,

I'm having a bit of a nightmare trying to build server packages for 2.15.1 - I 
feel like I've tried quite a few different approaches and am getting stumped, 
it is most likely that I am missing something incredibly obvious so I'd 
appreciate any pointers. I'd like to point out that I am by no means an expert 
in any of this though I have had about 4 years maintaining various Lustre 
builds - I am hugely grateful for all the work you guys do on Lustre, and I 
hope I don't come across as anything other than frustrated by my inability to 
get this version to install!

To start, we are running Rocky 8.6, with the 4.18.0-372.9.1.el8 kernel. 
Initially, as with previous Lustre builds I've done, I've installed the 
Whamcloud provided el8_lustre kernel, along with headers, then installed the 
Mellanox OFED stack from their repos, making sure to use the version that 
appears to be being used for 2.15.1 builds (in this case 5.6-2.0.9.0 - 
incidentally I can't find the specific OFEDs used in any compatibility matrices 
or changelogs, where this information used to be provided, so I've back 
formulated from the Whamcloud repos).

In the past, I've then simply yum/dnf installed the lustre server packages. 
With this release however, I immediately run in to "Nothing provides _ksym" 
errors, all of which appear to be for ZFS symbols. A quick check on the 
Whamcloud Jira throws up this<https://jira.whamcloud.com/browse/LU-16109> 
issue, which is marked closed but references 
this<https://jira.whamcloud.com/browse/LU-16059> issue, which says that the 
issue is fixed but not for 2.15.1, instead for 2.15.2.

I'm intending to build these servers with kickstart and puppet, and I'd much 
rather use the official repos rather than compile it myself but this is not a 
hard requirement. A quick check on the Whamcloud repos and it appears that 
2.15.2 only supports 8.7, rather than 8.6. This is a bit of a problem as I'd 
like to keep the same kernel version as the rest of the machines we're running 
where possible, but again, not a hard requirement. I spun up a new build for 
8.7 on the same hardware, updated the Mellanox repos to point to the new 
correct version and immediately got _ksym errors but now it appears they're for 
the OFEDs instead.

In the meantime, I've seen an email in the mailing list suggesting that the 
symbols are in fact provided by the package kmod-zfs, which is not provided by 
the OpenZFS repos, but that can be built manually, so I thought I'd have 
another crack at getting 2.15.1 working. I download the tar, built ZFS and 
installed the resulting rpms, making sure to install devel and debugsource and 
debuginfo. I attempted to build lustre against this and it all appears to go ok 
- I get some rpms out! However, installing them results in the exact same ksym 
errors. The thing is the ksyms appear to be present by name in /proc/kallsyms, 
just not matching exactly.

The main point I guess is that LU-16059 appears to have been closed 
erroneously, as on a fresh install the issue is 100% reproducible. I also note 
that the packages hosted 
here<https://downloads.whamcloud.com/public/lustre/lustre-2.15.1/el8.6/server/RPMS/x86_64/>
 have timestamps from 2022-08-10 but the issue was created and closed after 
this. I'm happy to re-open the bug and provide as much detail as necessary but 
thought I'd check to see if anyone else has experienced this issue or if I am 
indeed missing something trivial.

Thanks in advance
Jon


Jon Marshall

High Performance Computing Specialist



IT and Scientific Computing Team



Cancer Research UK Cambridge Institute

Li Ka Shing Centre | Robinson Way | Cambridge | CB2 0RE

Web<http://www.cruk.cam.ac.uk/> | 
Facebook<http://www.facebook.com/cancerresearchuk> | 
Twitter<http://twitter.com/CR_UK>



[Description: CRI Logo]<http://www.cruk.cam.ac.uk/>

_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to