Re: [lustre-discuss] proper procedure after MDT kernel panic

E.S. Rosenberg Thu, 11 Aug 2016 16:57:07 -0700

What is the normal amount of time I should expect
e2fsck --mdsdb
to be running (1T MDT)?
(So far it's running quite a few hours)
Thanks,
Eli


On Thu, Aug 11, 2016 at 12:42 PM, E.S. Rosenberg <[email protected]
> wrote:

> Hi all,
> Our MDT suffered a kernel panic (which I will post separately), the OSSs
> stayed alive but the MDT was out for some time while nodes still tried to
> interact with lustre.
>
> So I have several questions:
> a. what happens to processes/reading writing during such an event (if they
> already have handles on the OSS for instance that makes a difference)? I
> noticed several of our compute-nodes ended up filling their swap/RAM so I
> assume some level of caching is happening until the MDT returns....
>
> b. what is the best/proper procedure now to ensure filesystem integrity?
> Should I take the filesystem offline and run an lfsck first on the MDT
> then on the OSS?
>
> Most documents I can find with google on the subject are spread over the
> various old wikis so it is not clear to me how relevant they are....
> Thanks,
> Eli
>
> Specs:
> Server OS: CentOS 6.4 + lustre 2.5.3 from RPMs (1 MGS/MDS + 3 OSS)
> Clients: Debian testing/unstable, kernel 4.2.8 + lustre 2.8.0 built from
> source.
> Network: Infiniband FDR (o2ib)
>

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] proper procedure after MDT kernel panic

Reply via email to