Here at LLNL we have developed a little tool called stumpy
that also bypasses an OST without blocking.  The idea is that
we can add in a "stump" OST in place of a damaged OST until we find
out and fix the problem on the damaged OST.  The "stump" OST would be
started in a read-only/deactivated state so that no new objects will
be written to the device.  This avoids us having to go out to our
many thousands of clients and deactivating the damaged OST on each one.
The data in the client caches should also be safe with some new Lustre
fixes to ensure that when an OST goes read-only the client will hold the data
since the state is expected to be transient.

stumpy does require changes to the ldiskfs code (to allow mounting the
filesystem in read-only mode) as well as Lustre code changes to allow
Lustre to start in read-only mode.

The stumpy tool takes as input an ost name.  It will then create
a "stump" OST loopback file with certain settings that Lustre
expects.  It creates the last_rcvd, health_check, CATALOGS,
and LAST_ID files along with a base lustre filesystem.

Currently it works for lustre-1.4.8 and all prior lnet versions
(since it is reading the lustre xml file).  It should not be difficult to
port to 1.6 we are just not there yet.

-Herb

_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to