Am 02.10.2010 14:11, schrieb Volker Armin Hemmann:
> On Saturday 02 October 2010, Florian Philipp wrote:
[...]
>>
>> Assumptions:
>>
>> 1. Seek time is constant. For HDDs we can take an average value. Of
>> course this doesn't work for tapes. They have a seek time which
>> increases linearly with the distance between the fragments.
> 
> I think you misunderstood my remark.
> 
> Tapes try to stream. Take an old DLT drive with 5-10mb/sec streaming speed. 
> Slow, isn't it?
> 
> But when you do a backup on such an old tape even with a modern harddisk you 
> have problems keeping it streaming. As soon as you hit a directory with many 
> small files - like ~/Mail or /usr/portage you are screwed. 
> 
> Yes, you have wonderfull 100mb/sec when you read a big, fat file. Or a single 
> small file. But when you have houndreds, thousands or hundreds of thousands 
> of 
> small files, harddisks suck.
> And your tape drive has to stop and rewind every couple of seconds because 
> your harddisks were not able to keep up the required 10mb/sec. Trueley 
> pathetic.

Well, that's exactly what my little math shows. When you read 4kB files,
you can end up with 0.0065 * 50 MB/s = 0.32 MB/s effective throughput
(worst case).

> 
> Besides, seek times are not constant ;)
> 

Sure they aren't. That's why it is stated as an assumption. It is just a
model.  Like every model it has its limits.[1] It doesn't take into
account prefetching, caching and NCQ/TCQ, for example.

Still it is a valid assumption: *On average*, the read/write head has to
move around half the radius of the platter to reach its next position
and it has to wait for half a rotation until the right block is under
the head. If we assume that fragments are uniformly distributed over the
whole disk, we can simply take an average value for seek times.

The model also doesn't take into account that even with no
fragmentation, there might be some seek operations: Blocks on an HDD are
organized in rings (tracks), not as a spiral like the sound track on an
good old LP. That means that at some point, the r/w head has to switch
to the next track when the file does not reside on one track alone.

[1] A bit off-topic: I work in applied sciences and engineering. There
I've learned two basic rules about models: 1. Truth doesn't matter,
usefulness does. 2. Every model has its limits. Knowing these limits is
the single most important important thing when using a model.

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to