[ ... ] >> We do the fsck from the command line and look at the output. >> If there were no filesystem modifications (this is the usual >> case), we then start the Lustre services interactively.
> Note that if you are not running with writeback cache enabled > on the disks, then you shouldn't have to run an fsck on the > filesystems after a crash. This seems to me extremely bad advice, based on these rather extraordinarily optimistic assumptions: > That should only be needed if the storage is faulty, or if it > is using writeback cache without mirroring and battery backup. This reminds me of the immortal statement "as far as we know in our datacenter we never had an undetected error". How do you know whether "storage is faulty" or many of the other reaosn why metadata can get corrupted never happened? 'fsck' does metadata auditing and garbage collection and a full scan, at least every now and then, is essential to give some confidence that no hidden problem has been eating the metadata. And if there is a way to at least sample check data integrity (e.g. run 'gzip -t' on a subset of compressed files) I would run that periodically too. Experience with storage systems induces distrusts, never mind CERN's experiences: http://storagemojo.com/2007/09/19/cerns-data-corruption-research/ Admittedly "happy go lucky", as the investment banks have shown in the past several years with derivaties, can be a profitable strategy (until it blows up :->). [ ... ] _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
