Chris,

That's probably best, to be safe. By the way, this is one where (if I remember right) sometimes you run fsck, let it correct things, then you must run it again - As it will find new things to object about in the modified filesystem. So if you weren't already, running fsck repeatedly until it doesn't complain is best. (That's also a best practice anyway..)

I can't find a -d or -D option in my copy of fsck.  Not sure what it means?

Best of luck,
- Patrick

On 10/27/2015 12:52 PM, Chris Hunter wrote:
Hi Patrick,
Thanks for sharing your experience, looks like you did the bulk of troubleshooting in the Jira ticket.

I assume I should have a clean filesystem (ie. run fsck first) before disabling the dirdata feature ?
After I disable dirdata, I will need to run fsck with the "-D" option ?

FYI, ll_recover_lost_found_objs tool will recover files from lost+found on *OST* volumes (ie. moves them back into /O/0/dXX directory) based on extended file attributes. Section 37.5 of the HPDD manual.

thanks
chris hunter
[email protected]

On 10/27/2015 12:06 PM, Patrick Farrell wrote:
Chris,

I had the joy of taking this one apart personally. We mostly let lfsck do the repair and moved on, accepting that some of the dentries were trashed. I think, for important things, our field staff did some manual recovery with the e2fsprogs tools, but it was not a common enough problem that we documented a procedure.

If you read LU-5626 carefully, there's an explanation of the exact nature of the damage, and having that should let you make partial recoveries by hand. I'm not familiar with the ll_recover_lost_found_objs tool, but I doubt it would prove helpful in this instance.

Note that there's two forms to this corruption. One is if you move a directory which was created before dirdata was enabled, then the '..' entry ends up in the wrong place. This does not trouble Lustre, but fsck reports it as an error and will 'correct' it, which has the effect of (usually) overwriting one dentry in the directory when it creates a new '..' dentry in the correct location.

I don't *think* that one causes the MDT to go read only, but I could be wrong. I *think* what causes the MDT to go read only is the other problem:

When you have a non-htree directory (not too many items in it, all directory entries in a single inode) that is in the bad state described above (with the '..' dentry in the wrong place after being moved) and that directory has enough files added to it that it becomes an htree directory, the resulting directory is corrupted more severely. We never sorted out the precise details of this - I believe we chose to simply delete any directories in this state. (I think lfsck did it for us, but can't recall for sure.)

I'd advise reading LU-5626 with care, and I'd also suggest you might turn off 'dirdata' on your MDT until you have this under control. That will at least prevent any more directories from ending up in either of these bad states if you use the filesystem without updating Lustre to a version with the LU-5626 patch in it.

- Patrick
________________________________________
From: lustre-discuss [[email protected]] on behalf of Chris Hunter [[email protected]]
Sent: Tuesday, October 27, 2015 10:22 AM
To: [email protected]
Subject: [lustre-discuss]  recovery MDT ".." directory entries (LU-5626)

We have a lustre 1.8 filesystem that was upgraded to lustre 2.x and
"dirdata" feature was enabled. We encountered LU-5626/LU-2638 issue with
".." directory entries. Are there established recovery steps for this
issue ?

If I run fsck, the directory entries will be moved into lost+found.
I assume the next step is to run the ll_recover_lost_found_objs tool ?

Can you share any advice/experience about recovery ?

thanks,
chris hunter
[email protected]

_______________________________________________
lustre-discuss mailing list
[email protected]
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=AwIFAg&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=d_G2h_sZYG4xtHMeKo8QgjDmOcMVdQvYgM-5Dri1AOY&m=83OYH_ms_eqiU1wnAGo9fAzmYQX3fBG7y1eio_j_xpU&s=hl5TuadAk5fXgjermbroSP81LGazmXpj1BxqaIsP7Cw&e=


_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to