Hello! So are there any other compplaints on the OSS node when you mount that OST? Did you try to run e2fsck on the ost disk itself (while unmounted)? I assume one of the possible problems is just on0disk fs corruptions (and it might show unhealthy due to that right after mount too).
Bye, Oleg On Nov 18, 2010, at 1:47 PM, Herbert Fruchtl wrote: > Sorry, I had meant to cc this to the list. > > Herbert > > From: Herbert Fruchtl <herbert.fruc...@st-andrews.ac.uk> > Date: November 18, 2010 12:56:53 PM EST > To: Kevin Van Maren <kevin.van.ma...@oracle.com> > Subject: Re: [Lustre-discuss] Broken client > > > Hi Kevin, > > That didn't change anything. Umounting the of the OSTs hung (yes, with an > LBUG), and I did a hard reboot. It came up again, and the status is as > before: on the MDT server, I can see all files (well, I assume it's all); on > the client in question some files appear broken. The OST is still "not > healthy". I am running another lfsck, without much hope. Here's the LBUG: > > Nov 18 17:05:16 oss1-fs kernel: LustreError: > 8125:0:(lprocfs_status.c:865:lprocfs_free_client_stats()) LBU > > Herbert > > Kevin Van Maren wrote: >> Reboot the server with the unhealthy OST. >> If you look at the logs, there is likely an LBUG that is causing the >> problems. >> Kevin >> On Nov 18, 2010, at 9:51 AM, Herbert Fruchtl >> <herbert.fruc...@st-andrews.ac.uk> wrote: >>>> >>>> It looks like you may have corruption on the mdt or an ost, where the >>>> objects on an OST can't be found for the directory entry. Have you >>>> had a crash recently or run Lustre fsck? You might need to do fsck and >>>> delete (unlink) the "broken" files. >>>> >>> The files do exist (I can see them on the mdt server) and I don't want to >>> delete >>> them. There was a crash lately, and I have run an lfsck afterwards >>> (repeatedly, >>> actually. >>> >>>> I suppose it's also possible you're seeing fallout from an earlier LBUG or >>>> something. Try 'cat /proc/fs/lustre/health_check' on all the servers. >>>> >>> There seems to be a problem: >>> [r...@master ~]# cat /proc/fs/lustre/health_check >>> healthy >>> [r...@master ~]# ssh oss1 'cat /proc/fs/lustre/health_check' >>> device home-OST0005 reported unhealthy >>> NOT HEALTHY >>> [r...@master ~]# ssh oss2 'cat /proc/fs/lustre/health_check' >>> healthy >>> [r...@master ~]# ssh oss3 'cat /proc/fs/lustre/health_check' >>> healthy >>> >>> What do I do about the unhealthy OST? >>> >>> Herbert >>> -- >>> Herbert Fruchtl >>> Senior Scientific Computing Officer >>> School of Chemistry, School of Mathematics and Statistics >>> University of St Andrews >>> -- >>> The University of St Andrews is a charity registered in Scotland: >>> No SC013532 > > -- > Herbert Fruchtl > Senior Scientific Computing Officer > School of Chemistry, School of Mathematics and Statistics > University of St Andrews > -- > The University of St Andrews is a charity registered in Scotland: > No SC013532 > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss