I'm happy to that the problem seems to be solved by deleting the
CATALOGS file on the underlying MDT ZFS fs. As I gather from the manual
[1] this should not be a problem, because it will be handled by LFSCK.
If I'm wrong about this, please let me know. Also, I'm happy to provide
any information from this MDT to help asses if there is a bug somewhere.
LFSCK is running as we speak.
Cheers,
Hans Henrik
[1] https://doc.lustre.org/lustre_manual.xhtml#backup_fs_level.restore
On 11.03.2022 12.49, Hans Henrik Happe via lustre-discuss wrote:
I tried tunefs.lustre --erase-params --writeconf the targets. Guess it
is not great because the clients were not unmounted, but I made sure
they are not trying to connect.
This makes it possible to mount the MDT, but when the first OST mount
starts the MDT has a lot of errors. After starting the second OST the
MDS crashes (syslog attached).
Cheers,
Hans Henrik
On 10.03.2022 15.48, Hans Henrik Happe via lustre-discuss wrote:
Sorry for all the mail load, but I hope this info can help figuring
out what's wrong and determine if this was caused by a bug. I think
I read the CONFIGS on the MDT with llog_reader. See attachments.
Cheers,
Hans Henrik
On 10.03.2022 12.23, Hans Henrik Happe via lustre-discuss wrote:
After upgrading to Lustre 2.12.8 I found that the first mount after
a reboot behaves differently:
Mounting mds02/astro0 on /mnt/lustre/local/astro-MDT0000
mount.lustre: mount mds02/astro0 at /mnt/lustre/local/astro-MDT0000
failed: No space left on device
And a different syslog output (attached syslog-0).
Doing the mount again has this error:
Mounting mds02/astro0 on /mnt/lustre/local/astro-MDT0000
mount.lustre: mount mds02/astro0 at /mnt/lustre/local/astro-MDT0000
failed: File exists
And a syslog like the one first posted. Attached the new output in
syslog-1.
Finally, stopping Lustre (Only MGS in this case) and the lnet
service does free resources making lustre_rmmod fail:
# lustre_rmmod
rmmod: ERROR: Module osp is in use
Cheers,
Hans Henrik
On 10.03.2022 11.15, Hans Henrik Happe via lustre-discuss wrote:
Forgot to say this is Lustre 2.12.6 and CentOS 7.9
(3.10.0-1160.6.1.el7.x86_64).
On 10.03.2022 10.27, Hans Henrik Happe via lustre-discuss wrote:
Hi,
A reboot of the MDS stalled and got forced reset. After that the
MDS would not start. The syslog is attached.
I'm not sure what the "class_register_device())
astro-OST0002-osc-MDT0000" part is supposed to do but
astro-OST0002 is not mounted at this time. I guess this comes from
the MGS.
Cheers,
Hans Henrik
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org