James G. Sack (jim) wrote:
My understanding is that software raids do parallel access effectively
the same as hardware controllers. But hw offloads the parity and error
This isn't necessarily true, and the reason for this is below, but it's
a bit of an explanation. Basically, it has to do with the way most
normal architectures (especially x86{,_64}) handle hardware interrupts.
The way most proper raid controllers are built involves a bit of
encapsulation. First, a device that sits on the host's PCI bus which
serves as an interface between the OS and the RAID function. That device
normally has a fast, specialized CPU which does all the xor/xsum/DMA/IO
operation reordering, etc. This CPU will typically have its own memory
and PCI bus. On that PCI bus will be standard (or sometimes specialized)
interface chipsets which actually talk to the disks in question. Here's
where the difference in parallelism comes out: the specialized CPU on
the raid board will be able to more efficiently handle the interrupts
given to it. Normally, these CPUs will have some sort of vectored
interrupt table so it will be able to service them in parallel. x86
hardware, especially, lacks this. IOAPIC is a poor substitute for doing
it properly; the CPU can still only sequentially service interrupts. Not
to mention the fact that it has to compete with things like, oh,
userland processes that may be feeding data to the disks as fast as
possible. Or the user moving the mouse across the screen, resizing a
window, whatever. Read: it has better things to do.
Not only this, but I mentioned IO operation reordering. The hardware
controllers are *very* good about understanding that seeks are bad and
to avoid them at all costs. So, it'll internally reorder all the IO
operations sent to it before flushing it to disk or doing a media read.
Your Linux kernel also does this; it has several flavours available to
do so. The one used by default since 2.6.16 is cfq, or complete fair
queueing. It's a good mix of performance for multi-use systems. There's
also as, or anticipatory. When IOs are performed, this scheduler waits
for a small period to see if more IOs are coming to that same area of
the media. If so, it'll service those before servicing anything else.
It's great for desktop systems that typically do one thing at a time.
It's terrible for multi-use systems because it results in IO starvation
fairly quickly. There's a third scheduler, deadline, which is great for
database loads and mixed-mode-access. IOs come in, and are set aside,
but given a hard deadline for completion. More IOs will come in, and the
scheduler will reorder them. When the timer is up for the pending
operation, it'll schedule all IOs in the same area to go to disk at
once. But, since there's a guarantee that IOs will be serviced in a
certain amount of time, you don't get the starvation like when using as.
It's a great scheduler for file servers and database boxes. The fourth
option in the default kernel is noop, which is exactly what the name
says: a no-op. IOs are flushed/read to/from media as soon as they're
received(*) without being reordered. It's great for devices where there
is no seek penalty (like flash). It's also great for hardware RAID
controllers, because their scheduling is done on-board. No reason for
the host CPU to deal with reordering IOs if the board is just going to
reorder them as soon as it receives them. Better yet, is you can select
which IO scheduler to use both at boot-time (elevator=name on the kernel
command line) and on-the-fly via sysfs (since 2.6.17,
/sys/block/<device>/queue/iosched controls this).
(*) this isn't entirely true; Linux has an aggressive write-behind
caching layer.
checking/handling onto the cpu in the controller, thus gaining performance..
I've seen throughput increase 2-3 times, and system load decrease by an
appropriate number just by switching from software to hardware RAID.
This is equally funny, because the disks didn't change at all, nor did
the controller chips that were directly accessing them (I was using
Marvell SATA controllers on the host; turns out my RAID board of choice
also uses these internally).
- a. protection from damaging meddling (direct disk access) from other
sw on the host
Right. You can easily dd over the wrong disk with a mis-placed argument.
No amount of software raid will prevent this.
& (usually)
- b. prevention of debugging, monitoring or exploratory recovery operations.
Right, this is all done on the card's ROM.
However, this isn't to completely discount software raid; it definitely
has its place. I just don't use it where extremely high performance
requirements are necessary. Often, my systems have better things to do
than service storage interrupts all day long and reorder IOs they don't
need to reorder.
-kelsey
--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list