OK, managed to move an oss to another node. Roughly:
Deactivated the broken OSS. Used dump to dump the raw ost to another
system while lustre was live and serving our cluster. Once the dump
had finished (~30hrs for 3T) shut all lustre clients down (except for
1). Used the 1 client to do md5 check sums on around 10% of the
files on the broken ost (random) and saved the result. Unmounted the
final client, stop lustre on all oss's and mds. Mounted the broke
oss ost as an ext file system (same on temporary system) and did a
final rsync. This took about 15mins. Then shutdown the broken oss
and brought the temporary system up in its place. Restarted lustre
with 1 client and checked the md5 check sums to make sure files had
been copied reliably. Then got back to work.
Stu.
On 06/06/2007, at 4:52 PM, Stephen Willey wrote:
I inquired about this a while back and got the following:
"In order to minimize downtime, it would also be possible to use
the ext2
"dump" program in order to do device-level backups (including the
extended attributes) while the filesystem is in use. This backup
would
not be 100% coherent with the actual filesystem.
The problem with running rsync to do the final sync step is that this
has no understanding of extended attributes. For current (1.4.8)
versions of Lustre the OST EAs are not required for the correct
operation of the filesystem (they are redundant information to assist
recovery in case of corruption), but in the future that may not be
true."
The (offline) method of migrating an OST is here: https://
bugzilla.clusterfs.com/show_bug.cgi?id=4633 but after reading the
above I guess you should probably run the getfattr/setfattr
commands on the OSTs as well as the MDT.
Stephen
--
Dr Stuart Midgley
[EMAIL PROTECTED]
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss