On 07/29/1999 10:24 -0700, Lance Robinson wrote:
>> AFAIK: RAID-5 accesses are always in stripes. All disks are read (or
>> written) no matter how small the original read/write request. Whereas, RAID0
>> can read just one disk for smaller requests. RAID5 does a lot more work for
>> smaller requests.
>>
>> <>< Lance.
>>
>>
Haven't looked at the Linux implementation code, but if this
is true, somebody needs to be whacked... That's **verrrry**
sub-optimal. For writes, it is most efficient if you can
write-gather so that you can write an entire stripe, but
you do not need to access the entire row on reads - there
should be (almost) no overhead for reads, unless you've
already lost a disk and are having to calculate data from the
parity.
For example, consider a 3+1 RAID5 (4 drives) with a 1-block
chunk size. The following scenarios are the way I see things
should probably happen (assuming that Drive3 is the parity
drive for this row of chunks, and reads/writes are aligned
at the beginning of the row - for other situations, the
access needs to be separated into separate events that
are all the equivalent of one of these):
Event Drive0 Drive1 Drive2 Drive3 Reads Writes
--------------- ------- ------- ------- --------------- ------- ------
1 block read read 1 0
2 block read read read 2 0
3 block read read read read 3 0
1 block write write read/write 1 2
1 block write write read read write 2 2
2 block write write write read/write 1 3
2 block write write write read write 1 3
3 block write write write write write 0 4
Note the alternatives to the 1 and 2 block write scenarios
that avoid having 2 consecutive I/Os on the same drive (which
must necessarily wait for an entire rotation of the platter,
unless you are using the drive's write cache, in which case
you're just asking for trouble for a number of other reasons)
Also, for any of the writes, the parity cannot be rewritten
until enough data is available to recalculate it - either
by reading the drives that are not involved in the write
or by reading the parity and calculating it based on the
drives that are being written, so assuming writes can be
performed in parallel, and a write takes W ms, a 1 or 2
block write must take at least 2W ms to complete, where
a 3 block write can be done in 1W - this is why large (i.e.
full-stripe) writes are better under RAID5.
Note that none of this takes into account any RAID book
keeping that has to be done, either. Again, I haven't looked
real close at the Linux implementation, so I don't know
if it's doing any dirty region logging or anything, which
just makes the whole analysis that much more complex...
Hope that's useful info to someone... (Also hope I haven't
goofed something up there... ;-> )
tw
--
+------------------------------+--------------------------+
| Tim Walberg | Phone: 847-782-2472 |
| TERAbridge Technologies Corp | FAX: 847-623-1717 |
| 1375 Tri-State Parkway | [EMAIL PROTECTED] |
| Gurnee, IL 60031 | 800-SKY-TEL2 PIN 9353299 |
+------------------------------+--------------------------+
PGP signature