Hello!

  So are there any other compplaints on the OSS node when you mount that OST?
  Did you try to run e2fsck on the ost disk itself (while unmounted)? I assume 
one of the possible problems is just on0disk fs corruptions
  (and it might show unhealthy due to that right after mount too).

Bye, 
    Oleg
On Nov 18, 2010, at 1:47 PM, Herbert Fruchtl wrote:

> Sorry, I had meant to cc this to the list.
> 
>  Herbert
> 
> From: Herbert Fruchtl <herbert.fruc...@st-andrews.ac.uk>
> Date: November 18, 2010 12:56:53 PM EST
> To: Kevin Van Maren <kevin.van.ma...@oracle.com>
> Subject: Re: [Lustre-discuss] Broken client
> 
> 
> Hi Kevin,
> 
> That didn't change anything. Umounting the of the OSTs hung (yes, with an 
> LBUG), and I did a hard reboot. It came up again, and the status is as 
> before: on the MDT server, I can see all files (well, I assume it's all); on 
> the client in question some files appear broken. The OST is still "not 
> healthy". I am running another lfsck, without much hope. Here's the LBUG:
> 
> Nov 18 17:05:16 oss1-fs kernel: LustreError: 
> 8125:0:(lprocfs_status.c:865:lprocfs_free_client_stats()) LBU
> 
>  Herbert
> 
> Kevin Van Maren wrote:
>> Reboot the server with the unhealthy OST.
>> If you look at the logs, there is likely an LBUG that is causing the 
>> problems.
>> Kevin
>> On Nov 18, 2010, at 9:51 AM, Herbert Fruchtl 
>> <herbert.fruc...@st-andrews.ac.uk> wrote:
>>>> 
>>>> It looks like you may have corruption on the mdt or an ost, where the
>>>> objects on an OST can't be found for the directory entry. Have you
>>>> had a crash recently or run Lustre fsck? You might need to do fsck and
>>>> delete (unlink) the "broken" files.
>>>> 
>>> The files do exist (I can see them on the mdt server) and I don't want to 
>>> delete
>>> them. There was a crash lately, and I have run an lfsck afterwards 
>>> (repeatedly,
>>> actually.
>>> 
>>>> I suppose it's also possible you're seeing fallout from an earlier LBUG or
>>>> something. Try 'cat /proc/fs/lustre/health_check' on all the servers.
>>>> 
>>> There seems to be a problem:
>>> [r...@master ~]# cat /proc/fs/lustre/health_check
>>> healthy
>>> [r...@master ~]# ssh oss1 'cat /proc/fs/lustre/health_check'
>>> device home-OST0005 reported unhealthy
>>> NOT HEALTHY
>>> [r...@master ~]# ssh oss2 'cat /proc/fs/lustre/health_check'
>>> healthy
>>> [r...@master ~]# ssh oss3 'cat /proc/fs/lustre/health_check'
>>> healthy
>>> 
>>> What do I do about the unhealthy OST?
>>> 
>>> Herbert
>>> -- 
>>> Herbert Fruchtl
>>> Senior Scientific Computing Officer
>>> School of Chemistry, School of Mathematics and Statistics
>>> University of St Andrews
>>> -- 
>>> The University of St Andrews is a charity registered in Scotland:
>>> No SC013532
> 
> -- 
> Herbert Fruchtl
> Senior Scientific Computing Officer
> School of Chemistry, School of Mathematics and Statistics
> University of St Andrews
> --
> The University of St Andrews is a charity registered in Scotland:
> No SC013532
> 
> 
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to