Re: [Lustre-discuss] Fsck downtime estimates

Peter Grandi Sat, 28 Apr 2007 12:43:21 -0700

>>> On Fri, 27 Apr 2007 14:13:56 -0600, Andreas Dilger
>>> <[EMAIL PROTECTED]> said:


[ ... 'fsck' times ... ]

adilger> While the 1-3h per TB is reasonable, what is important to
adilger> note is that this checking happens IN PARALLEL for lustre.
adilger> If you have 500 2TB OSTs = 1PB, then you can still check
adilger> all of them in 2-4 hours.

Ahhh interesting. But yes, if they are on separate hosts, but
for example I have only one with 12TB on a RAID10.

My main reason to look at Lustre is not to take advantage of the
cluster based parallelism, but to have 6x2TB OSTs on the same
machine and hope that if there are active updates to only one then
only one needs 'fsck'ing. Basically my main reason is to reduce
post-crash service unavailability due to 'fsck'.

My particular application would have 12TB of 20-80MB files, let's
say around 200,000-700,000 inodes in total.

adilger> CFS has also recently developed patches to improve the
adilger> e2fsck speed for ext3 filesystems by 2-20x (depends on
adilger> filesystem usage). What used to take 1h to check has been
adilger> shown for production filesystems to take only 10
adilger> minutes...

Well, that would be nice, but also sounds a bit implausible.
Production filesystems tend to be full, with metadata scattered
all over the place, and 'ext3' has quite a bit of quite scattered
metadata, and become very fragmented quite rapidly.

>> Could someone from CFS suggest a sort of formula to calculate
>> the fsck downtime in a more accurate manner? This is often
>> important when planning for service levels. If a file system is
>> spread over multiple OSTs, which fsck operations run in
>> parallel? May metadata checking be parallelized?

adilger> Yes, the OST and MDS e2fsck checking can be done in
adilger> parallel.

I wonder if one had those 6x2TB OSTs on the same RAID10 then
parallel checking would be faster thanks to all those arms.

adilger> The distributed checking phase (lfsck) is not needed
adilger> before returning the filesystem to service, and can also
adilger> be run while the filesystem is in use. We are planning to
adilger> eliminate the need for running a separate lfsck entirely,
adilger> and the filesystem will just do "scrubbing" internally
adilger> all the time during idle times or as a low-priority task.

Ahh interesting too, but this may not always be feasible: the
application I am thinking of has 24x7 simultaneous read and write
rates of around 100MB/s each (and yes using just a single system
is unfortunately non-negotiable right now).

_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Fsck downtime estimates

Reply via email to