After upgrading to Lustre 2.12.8 I found that the first mount after a reboot behaves differently:

Mounting mds02/astro0 on /mnt/lustre/local/astro-MDT0000
mount.lustre: mount mds02/astro0 at /mnt/lustre/local/astro-MDT0000 failed: No space left on device

And a different syslog output (attached syslog-0).

Doing the mount again has this error:

Mounting mds02/astro0 on /mnt/lustre/local/astro-MDT0000
mount.lustre: mount mds02/astro0 at /mnt/lustre/local/astro-MDT0000 failed: File exists

And a syslog like the one first posted. Attached the new output in syslog-1.

Finally, stopping Lustre (Only MGS in this case) and the lnet service does free resources making lustre_rmmod fail:

# lustre_rmmod
rmmod: ERROR: Module osp is in use


Cheers,
Hans Henrik

On 10.03.2022 11.15, Hans Henrik Happe via lustre-discuss wrote:
Forgot to say this is Lustre 2.12.6 and CentOS 7.9 (3.10.0-1160.6.1.el7.x86_64).

On 10.03.2022 10.27, Hans Henrik Happe via lustre-discuss wrote:
Hi,

A reboot of the MDS stalled and got forced reset. After that the MDS would not start. The syslog is attached.

I'm not sure what the "class_register_device()) astro-OST0002-osc-MDT0000" part is supposed to do but astro-OST0002 is not mounted at this time. I guess this comes from the MGS.

Cheers,
Hans Henrik
Mar 10 12:08:15 mds02 kernel: Lustre: MGS: Connection restored to 
3be12548-8d1b-39d8-1ec0-0381833f8bc2 (at 172.20.200.30@tcp1)
Mar 10 12:08:15 mds02 kernel: Lustre: Skipped 42 previous similar messages
Mar 10 12:08:33 mds02 kernel: Lustre: 5191:0:(llog_cat.c:93:llog_cat_new_log()) 
astro-OST0002-osc-MDT0000: there are no more free slots in catalog 
[0x2:0x1:0x0]:0
Mar 10 12:08:33 mds02 kernel: LustreError: 
5191:0:(osp_sync.c:1524:osp_sync_init()) astro-OST0002-osc-MDT0000: can't 
initialize llog: rc = -28
Mar 10 12:08:33 mds02 kernel: LustreError: 
5191:0:(obd_config.c:559:class_setup()) setup astro-OST0002-osc-MDT0000 failed 
(-28)
Mar 10 12:08:33 mds02 kernel: LustreError: 
5191:0:(obd_config.c:1835:class_config_llog_handler()) MGC10.21.10.102@o2ib: 
cfg command failed: rc = -28
Mar 10 12:08:33 mds02 kernel: Lustre:    cmd=cf003 0:astro-OST0002-osc-MDT0000  
1:astro-OST0002_UUID  2:172.21.10.116@tcp  
Mar 10 12:08:33 mds02 kernel: LustreError: 15c-8: MGC10.21.10.102@o2ib: The 
configuration from log 'astro-MDT0000' failed (-28). This may be the result of 
communication errors between this node and the MGS, a bad configuration, or 
other errors. See the syslog for more information.
Mar 10 12:08:33 mds02 kernel: LustreError: 
5131:0:(obd_mount_server.c:1397:server_start_targets()) failed to start server 
astro-MDT0000: -28
Mar 10 12:08:33 mds02 kernel: LustreError: 
5131:0:(obd_mount_server.c:1992:server_fill_super()) Unable to start targets: 
-28
Mar 10 12:08:33 mds02 kernel: Lustre: Failing over astro-MDT0000
Mar 10 12:08:33 mds02 kernel: Lustre: server umount astro-MDT0000 complete
Mar 10 12:08:33 mds02 kernel: LustreError: 
5131:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount  (-28)

Mar 10 12:10:56 mds02 kernel: LustreError: 
5622:0:(genops.c:556:class_register_device()) astro-OST0002-osc-MDT0000: 
already exists, won't add
Mar 10 12:10:56 mds02 kernel: LustreError: 
5622:0:(obd_config.c:1835:class_config_llog_handler()) MGC10.21.10.102@o2ib: 
cfg command failed: rc = -17
Mar 10 12:10:56 mds02 kernel: Lustre:    cmd=cf001 0:astro-OST0002-osc-MDT0000  
1:osp  2:astro-MDT0000-mdtlov_UUID  
Mar 10 12:10:56 mds02 kernel: LustreError: 15c-8: MGC10.21.10.102@o2ib: The 
configuration from log 'astro-MDT0000' failed (-17). This may be the result of 
communication errors between this node and the MGS, a bad configuration, or 
other errors. See the syslog for more information.
Mar 10 12:10:56 mds02 kernel: LustreError: 
5566:0:(obd_mount_server.c:1397:server_start_targets()) failed to start server 
astro-MDT0000: -17
Mar 10 12:10:56 mds02 kernel: LustreError: 
5566:0:(obd_mount_server.c:1992:server_fill_super()) Unable to start targets: 
-17
Mar 10 12:10:56 mds02 kernel: Lustre: Failing over astro-MDT0000
Mar 10 12:10:56 mds02 kernel: Lustre: server umount astro-MDT0000 complete
Mar 10 12:10:56 mds02 kernel: LustreError: 
5566:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount  (-17)

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to