On 2010-05-20, at 20:25, Mervini, Joseph A wrote:
> We encountered a multi-disk failure on one of our mdadm RAID6 8+2 OSTs. 2 
> drives failed in the array within the space of a couple of hours and were 
> replaced.

I guess the need for +3 parity is closer than we think...

> Fortunately I am able (at least for now) to assemble the array with the 
> existing 8/10 arrays and am able to fsck, mount via ldiskfs and lustre and am 
> in the process of copying files from the vulnerable OST to a backup location 
> using "lfs find --obd <target> /scratch|cpio -puvdm ..."

I'm assuming at this point you also have the OST in question deactivated on the 
MDS, (lctl --device N deactivate) so that it is not getting new files as well?

If you track the original files that were successfully copied, you could rename 
the new files back over top of the old ones, and remove any trace of the old 
file.

Another option would be to use the "lfs_migrate" script (see bugzilla), which 
essentially does this, with a data check in between.  Note that it isn't 
totally safe for a live system, since it has no way to know which files are in 
use while it is copying it, but I'm assuming at this point that is irrelevant.

> My question is: What is the best way to restore the OST? Obviously I will 
> need to somehow restore the array to its full 8+2 configuration. Whether we 
> need to start from scratch or use some other means, that is our first 
> priority. But I would like to make the recovery as transparent to the users 
> as possible. 
> 
> One possible option that we are considering is simply removing the OST from 
> Lustre, fixing the array and copying the recovered files to a newly created 
> OST (not desirable).

I'd try to avoid this option, it leaves the old OST around forever.  If you 
decide to erase the old OST, one option is to just copy over the base config 
files (/CONFIGS/*, /CATALOGS, /O/0/LAST_ID, /last_rcvd) to the new OST.  This 
of course should be done after migrating or otherwise deleting the objects that 
are on this OST.

> Another is to fix the OST (not remove it from Lustre), delete the files that 
> exist  and then copy the recovered files back. The problem that comes to mind 
> in either scenario is what happens if a file is part of a striped file? Does 
> it lose its affinity with the rest of the stripe?

I'm not sure what you mean by "affinity" here.  If you copy the file to a new 
file it will normally get the default striping, but there is no way from 
userspace to "break" the striping of a file.  If the an object on that OST is 
missing, then copy will return EIO for that file and you need to restore it 
from backup.

Note there is a lustre-patched tar which would keep the original file striping. 
 Any tool can preserve the file striping via xattrs, and doesn't have to know 
anything about Lustre internals:

xattr_size = getxattr("/path/to/file", "lustre.lov", buf, 65536);
mknod("/path/to/new_file");
setxattr("/path/to/new_file", lustre.lov, xattr_size, 0);

> Another scenario that we are wondering about is if we mount the OST via 
> ldiskfs and copy everything on the file system to a backup location, fix the 
> array maintaining the same tunefs.lustre configuration, then move everything 
> back using the same method as it was backed up, will the files be presented 
> to lustre (mds and clients) just as it was before when mounted as a lustre 
> file system? 

The easiest option is to just do a block-device-level copy of the whole 
filesystem to a new LUN, and then run e2fsck on that.  Next best is to format a 
new OST with mkfs.lustre, set the label by hand to match the old OST name via 
"tune2fs -L {fsname}-OSTNNNN", then mount it via ldiskfs and copy the files 
over.

Note when copying (or backing up and restoring) the objects on the OST you 
should preserve the xattrs using some tool that can handle this (e.g. RHEL tar, 
or rsync 3.x) since there is recovery information stored in the object xattrs.

The OST xattrs are not needed for normal operation, but if you have disk 
corruption and can run e2fsck and then ll_recover_lost_found_objs you'll be 
happy to get your data back.

The clients and OST code will not be able to tell the difference between the 
old and replacement OSTs.

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to