On 2010-07-03, at 15:02, [email protected] wrote:
>> Note that if you are not running with writeback cache enabled
>> on the disks, then you shouldn't have to run an fsck on the
>> filesystems after a crash.
> 
> This seems to me extremely bad advice, based on these rather
> extraordinarily optimistic assumptions:
> 
>> That should only be needed if the storage is faulty, or if it
>> is using writeback cache without mirroring and battery backup.
> 
> This reminds me of the immortal statement "as far as we know in
> our datacenter we never had an undetected error".

I think my record speaks for itself in terms of advocating running fsck on 
filesystems on a regular basis. I think you are making assumptions about what 
my statement says or does not say. What it says is that you shouldn't need to 
run fsck after a crash, if this wasn't involving e.g. RAID controller failure 
or the loss of writeback cache.

It doesn't say that you should never run fsck, and in fact I always recommend a 
full fsck in case on RAID failure or if the filesystem has detected 
inconsistencies.

My point was that if there are uptime requirements that running a full fsck 
after an unplanned outage of one node  is probably a bad use of time. It would 
be better to run a full fsck on ALL of the filesystems during scheduled 
maintenance windows, since they can be run in parallel and wouldn't take longer 
than a single node. 

I have also written the lvcheck tool to run fsck on LVM snapshots via cron on a 
regular basis so that you don't need to wait for a crash before validating 
whether your hardware is faulty.

> a full scan, at least every now and then, is essential to give some
> confidence that no hidden problem has been eating the metadata.

I've been a staunch advocate among the ext4 developers for keeping the  
periodic fsck at mount time to catch those places that never fsck on their own. 
If that bothers people because of the unexpected delay in startup, I point them 
at the script so they can check the snapshot and reset the fsck counters before 
they expire.
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to