What errors are indicated in the kernel ring buffer on the client (dmesg) ?
On Wed, Feb 22, 2023 at 10:56 PM Sid Young via lustre-discuss < [email protected]> wrote: > Hi all, > > I've been running lustre 2.12.6 and (clients are 2.12.7) on HP gear for > nearly 2 years and had an odd crash requiring a reboot of all nodes. I have > lustre /home and /lustre file systems and I've been able to remount them on > the clients after restarting the MGS/MDT and OSS nodes but on any client > when I do an ls -la on the /lustre file system it locks solid. The /home > appears to be OK for the directories and sub-directories I tested. > > I am ver rusty on Lustre now but I logged into another node and ran the > following: > > [root@n04 ~]# lfs check osts > home-OST0000-osc-ffff9f3b26547800 active. > home-OST0001-osc-ffff9f3b26547800 active. > home-OST0002-osc-ffff9f3b26547800 active. > home-OST0003-osc-ffff9f3b26547800 active. > lustre-OST0000-osc-ffff9efd1e392800 active. > lustre-OST0001-osc-ffff9efd1e392800 active. > lustre-OST0002-osc-ffff9efd1e392800 active. > lustre-OST0003-osc-ffff9efd1e392800 active. > lustre-OST0004-osc-ffff9efd1e392800 active. > lustre-OST0005-osc-ffff9efd1e392800 active. > [root@n04 ~]# lfs check mds > home-MDT0000-mdc-ffff9f3b26547800 active. > lustre-MDT0000-mdc-ffff9efd1e392800 active. > [root@n04 ~]# lfs check servers > home-OST0000-osc-ffff9f3b26547800 active. > home-OST0001-osc-ffff9f3b26547800 active. > home-OST0002-osc-ffff9f3b26547800 active. > home-OST0003-osc-ffff9f3b26547800 active. > lustre-OST0000-osc-ffff9efd1e392800 active. > lustre-OST0001-osc-ffff9efd1e392800 active. > lustre-OST0002-osc-ffff9efd1e392800 active. > lustre-OST0003-osc-ffff9efd1e392800 active. > lustre-OST0004-osc-ffff9efd1e392800 active. > lustre-OST0005-osc-ffff9efd1e392800 active. > home-MDT0000-mdc-ffff9f3b26547800 active. > lustre-MDT0000-mdc-ffff9efd1e392800 active. > [root@n04 ~]# > > [root@n04 ~]# lfs df -h > UUID bytes Used Available Use% Mounted on > home-MDT0000_UUID 4.2T 217.5G 4.0T 6% /home[MDT:0] > home-OST0000_UUID 47.6T 42.5T 5.1T 90% /home[OST:0] > home-OST0001_UUID 47.6T 44.6T 2.9T 94% /home[OST:1] > home-OST0002_UUID 47.6T 41.9T 5.7T 88% /home[OST:2] > home-OST0003_UUID 47.6T 42.2T 5.4T 89% /home[OST:3] > > filesystem_summary: 190.4T 171.2T 19.1T 90% /home > > UUID bytes Used Available Use% Mounted on > lustre-MDT0000_UUID 5.0T 53.8G 4.9T 2% > /lustre[MDT:0] > lustre-OST0000_UUID 47.6T 42.3T 5.3T 89% > /lustre[OST:0] > lustre-OST0001_UUID 47.6T 41.8T 5.8T 88% > /lustre[OST:1] > lustre-OST0002_UUID 47.6T 41.3T 6.3T 87% > /lustre[OST:2] > lustre-OST0003_UUID 47.6T 42.3T 5.3T 89% > /lustre[OST:3] > lustre-OST0004_UUID 47.6T 43.7T 3.9T 92% > /lustre[OST:4] > lustre-OST0005_UUID 47.6T 40.1T 7.4T 85% > /lustre[OST:5] > > filesystem_summary: 285.5T 251.5T 34.0T 89% /lustre > > [root@n04 ~]# > > Is it worth remounting everything and hope crash recovery is working or is > there some specific checks I can make. > > > > Sid Young > _______________________________________________ > lustre-discuss mailing list > [email protected] > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
