Here at LLNL we have developed a little tool called stumpy that also bypasses an OST without blocking. The idea is that we can add in a "stump" OST in place of a damaged OST until we find out and fix the problem on the damaged OST. The "stump" OST would be started in a read-only/deactivated state so that no new objects will be written to the device. This avoids us having to go out to our many thousands of clients and deactivating the damaged OST on each one. The data in the client caches should also be safe with some new Lustre fixes to ensure that when an OST goes read-only the client will hold the data since the state is expected to be transient.
stumpy does require changes to the ldiskfs code (to allow mounting the filesystem in read-only mode) as well as Lustre code changes to allow Lustre to start in read-only mode. The stumpy tool takes as input an ost name. It will then create a "stump" OST loopback file with certain settings that Lustre expects. It creates the last_rcvd, health_check, CATALOGS, and LAST_ID files along with a base lustre filesystem. Currently it works for lustre-1.4.8 and all prior lnet versions (since it is reading the lustre xml file). It should not be difficult to port to 1.6 we are just not there yet. -Herb _______________________________________________ Lustre-discuss mailing list [email protected] https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
