On Thu, 19 Aug 1999, James Manning wrote:
> > What is the rationale for running sw raid0 over hw raid0,
> > using a single hw raid controller? I don't quite see why
> > it should be superior to the all-hw solution.
> > Now, if you have multiple hw raid controllers, or if you have
> > anemic controllers and want to do sw raid5 over hw raid0,
> > or something like that I can begin to understand.
>
> My bonnie results for raid5 getting done in hardware have been horrible.
> Admittedly, this is with a single card (until my shipment comes in)
> so it's 4 500MHz Xeon's using MMX vs. a single StrongARM 233 (no SIMD)
> in the XOR battle. Due to MMX, KNI, etc. I really don't expect h/w
> raid to do better unless the memory hierarchy bottlenecks, but the KNI
> scheme seems to help address that.
It's not only the XOR battle -- first the raid5 driver will not utilize
the four processor to do the xor simultaneously. The kernel md thread
can run in parallel to the user space bonnie program, but the pass on
the hash table to perform the xor's will currently be done only by one
processor and will not take advantage of several. So there is still room
to get better numbers in SW raid on multi-cpu setup.
Another issue in which software RAID has a huge advantage is that in
sequential writes, most of the system memory is used in buffering, and
the RAID layer actually *changes* the order in which requests are queued
to the low level driver.
To illustrate this, suppose that in a 5-disk setup with 32KB chunk
size the RAID layer is getting requests for blocks 0 - 64KB, so
that in stripe 0 (which consists of the first 32KB of each disk),
only disk0 and disk1 are getting requests:
disk0 disk1 disk2 disk3 disk4(suppose parity)
HW: x x
The RAID layer will go back to the buffer cache and actively search
for the missing disk2 and disk3 in the system memory, and queue them
at this oppurtinity, so that we will be able to compute the parity
without any reads and write all the stripe in one step:
disk0 disk1 disk2 disk3 disk4
SW: x x found found parity computed right away
This is an optimization which is hard to perform for the hardware
RAID card, since it can't control the order in which Linux queues
requests for it, and from its point of view he see's only disks 0 and 1
in the stripe, and might choose to perform 2 reads of disk2 and disk3
to be able to compute the parity.
A lot of cache memory on the RAID card would certainly help, as Linux
is likely to queue the missing disks in the stripe as well before its
cache is full, but it can't force Linux to do it, where in the software
we always search all system memory allocated for buffers for the missing
disks in the stripe, and avoid most read-modify-write and reconstruct-
write during sequential writes.
Gadi
>
> The s/w 0 over h/w 0 is more about trying to find and fix whatever is
> causing the high CPU util... using 99%+ of 4 Xeon's to get 40-50MB/sec
> is kinda silly, esp. given that a single one of them spec'd over 1GB/sec
> in the xor testing initially *shrug*.
>
> James
> --
> Miscellaneous Engineer --- IBM Netfinity Performance Development
>
>