OK, managed to move an oss to another node.  Roughly:

Deactivated the broken OSS. Used dump to dump the raw ost to another system while lustre was live and serving our cluster. Once the dump had finished (~30hrs for 3T) shut all lustre clients down (except for 1). Used the 1 client to do md5 check sums on around 10% of the files on the broken ost (random) and saved the result. Unmounted the final client, stop lustre on all oss's and mds. Mounted the broke oss ost as an ext file system (same on temporary system) and did a final rsync. This took about 15mins. Then shutdown the broken oss and brought the temporary system up in its place. Restarted lustre with 1 client and checked the md5 check sums to make sure files had been copied reliably. Then got back to work.

Stu.


On 06/06/2007, at 4:52 PM, Stephen Willey wrote:

I inquired about this a while back and got the following:

"In order to minimize downtime, it would also be possible to use the ext2
"dump" program in order to do device-level backups (including the
extended attributes) while the filesystem is in use. This backup would
not be 100% coherent with the actual filesystem.

The problem with running rsync to do the final sync step is that this
has no understanding of extended attributes.  For current (1.4.8)
versions of Lustre the OST EAs are not required for the correct
operation of the filesystem (they are redundant information to assist
recovery in case of corruption), but in the future that may not be true."

The (offline) method of migrating an OST is here: https:// bugzilla.clusterfs.com/show_bug.cgi?id=4633 but after reading the above I guess you should probably run the getfattr/setfattr commands on the OSTs as well as the MDT.

Stephen

--
Dr Stuart Midgley
[EMAIL PROTECTED]



_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to