I've managed to solve this after checking a few nodes in the cluster and discovered this particular node must have had a partial update resulting in a mismatch between the kernel version (locked at base release) and some of the kernel support files which appeared to be a slightly later release causing the DKMS to not generate the required files.
Normally I disable kernel updates in YUM so everything is at the same release version and just update packages until I'm ready for a major update cycle. bad node: # yum list installed | grep kernel abrt-addon-kerneloops.x86_64 2.1.11-60.el7.centos @anaconda kernel.x86_64 3.10.0-1160.el7 @anaconda kernel-debug-devel.x86_64 3.10.0-1160.15.2.el7 @updates kernel-devel.x86_64 3.10.0-1160.15.2.el7 @updates kernel-headers.x86_64 3.10.0-1160.15.2.el7 @updates kernel-tools.x86_64 3.10.0-1160.15.2.el7 @updates kernel-tools-libs.x86_64 3.10.0-1160.15.2.el7 @updates # Working node: # yum list installed | grep kernel abrt-addon-kerneloops.x86_64 2.1.11-60.el7.centos @anaconda kernel.x86_64 3.10.0-1160.el7 @anaconda kernel-debug-devel.x86_64 3.10.0-1160.31.1.el7 @updates kernel-devel.x86_64 3.10.0-1160.el7 @/kernel-devel-3.10.0-1160.el7.x86_64 kernel-headers.x86_64 3.10.0-1160.el7 @anaconda kernel-tools.x86_64 3.10.0-1160.el7 @anaconda kernel-tools-libs.x86_64 3.10.0-1160.el7 @anaconda # After I removed the extraneous release packages and the lustre packages, I then updated the kernel and re-installed the kernel-headers and kernel-devel code then installed the (minimal) lustre client: # yum list installed|grep lustre kmod-lustre-client.x86_64 2.12.7-1.el7 @/kmod-lustre-client-2.12.7-1.el7.x86_64 lustre-client.x86_64 2.12.7-1.el7 @/lustre-client-2.12.7-1.el7.x86_64 lustre-client-dkms.noarch 2.12.7-1.el7 @/lustre-client-dkms-2.12.7-1.el7.noarch # And all good, every mounts and works first go as expected :) Sid Young Translational Research Institute Brisbane > ---------- Forwarded message ---------- > From: Sid Young <[email protected]> > To: lustre-discuss <[email protected]> > Cc: > Bcc: > Date: Mon, 8 Nov 2021 11:15:59 +1000 > Subject: [lustre-discuss] upgrade 2.12.6 to 2.12.7 - no lnet after reboot? > I was running 2.12.6 on a HP DL385 running standard Centos 7.9 > (3.10.0-1160.el7.x86_64) for around 6 months and decided to plan and start > an upgrade cycle to 2.12.7, so I downloaded and installed the 2.12.7 centos > release from whamcloud using the 7.9.2009 release RPMS > > # cat /etc/centos-release > CentOS Linux release 7.9.2009 (Core) > > I have tried on the a node and I now have the following error after I > rebooted: > > # modprobe -v lnet > modprobe: FATAL: Module lnet not found. > > I suspect its not built against the kernel as there are 3 releases showing > and no errors during the yum install process: > > # ls -la /usr/lib/modules > drwxr-xr-x. 3 root root 4096 Mar 18 2021 3.10.0-1160.2.1.el7.x86_64 > drwxr-xr-x 3 root root 4096 Nov 8 10:32 3.10.0-1160.25.1.el7.x86_64 > drwxr-xr-x. 7 root root 4096 Nov 8 11:02 3.10.0-1160.el7.x86_64 > # > > Anyone upgraded this way? Any obvious gottas I've missed? > > Sid Young > Translational Research Institute > Brisbane > >
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
