James G. Sack (jim) wrote:

My understanding is that software raids do parallel access effectively
the same as hardware controllers. But hw offloads the parity and error

This isn't necessarily true, and the reason for this is below, but it's a bit of an explanation. Basically, it has to do with the way most normal architectures (especially x86{,_64}) handle hardware interrupts.

The way most proper raid controllers are built involves a bit of encapsulation. First, a device that sits on the host's PCI bus which serves as an interface between the OS and the RAID function. That device normally has a fast, specialized CPU which does all the xor/xsum/DMA/IO operation reordering, etc. This CPU will typically have its own memory and PCI bus. On that PCI bus will be standard (or sometimes specialized) interface chipsets which actually talk to the disks in question. Here's where the difference in parallelism comes out: the specialized CPU on the raid board will be able to more efficiently handle the interrupts given to it. Normally, these CPUs will have some sort of vectored interrupt table so it will be able to service them in parallel. x86 hardware, especially, lacks this. IOAPIC is a poor substitute for doing it properly; the CPU can still only sequentially service interrupts. Not to mention the fact that it has to compete with things like, oh, userland processes that may be feeding data to the disks as fast as possible. Or the user moving the mouse across the screen, resizing a window, whatever. Read: it has better things to do.

Not only this, but I mentioned IO operation reordering. The hardware controllers are *very* good about understanding that seeks are bad and to avoid them at all costs. So, it'll internally reorder all the IO operations sent to it before flushing it to disk or doing a media read. Your Linux kernel also does this; it has several flavours available to do so. The one used by default since 2.6.16 is cfq, or complete fair queueing. It's a good mix of performance for multi-use systems. There's also as, or anticipatory. When IOs are performed, this scheduler waits for a small period to see if more IOs are coming to that same area of the media. If so, it'll service those before servicing anything else. It's great for desktop systems that typically do one thing at a time. It's terrible for multi-use systems because it results in IO starvation fairly quickly. There's a third scheduler, deadline, which is great for database loads and mixed-mode-access. IOs come in, and are set aside, but given a hard deadline for completion. More IOs will come in, and the scheduler will reorder them. When the timer is up for the pending operation, it'll schedule all IOs in the same area to go to disk at once. But, since there's a guarantee that IOs will be serviced in a certain amount of time, you don't get the starvation like when using as. It's a great scheduler for file servers and database boxes. The fourth option in the default kernel is noop, which is exactly what the name says: a no-op. IOs are flushed/read to/from media as soon as they're received(*) without being reordered. It's great for devices where there is no seek penalty (like flash). It's also great for hardware RAID controllers, because their scheduling is done on-board. No reason for the host CPU to deal with reordering IOs if the board is just going to reorder them as soon as it receives them. Better yet, is you can select which IO scheduler to use both at boot-time (elevator=name on the kernel command line) and on-the-fly via sysfs (since 2.6.17, /sys/block/<device>/queue/iosched controls this).

(*) this isn't entirely true; Linux has an aggressive write-behind caching layer.

checking/handling onto the cpu in the controller, thus gaining performance..

I've seen throughput increase 2-3 times, and system load decrease by an appropriate number just by switching from software to hardware RAID. This is equally funny, because the disks didn't change at all, nor did the controller chips that were directly accessing them (I was using Marvell SATA controllers on the host; turns out my RAID board of choice also uses these internally).

- a. protection from damaging meddling (direct disk access) from other
sw on the host

Right. You can easily dd over the wrong disk with a mis-placed argument. No amount of software raid will prevent this.

& (usually)

- b. prevention of debugging, monitoring or exploratory recovery operations.

Right, this is all done on the card's ROM.



However, this isn't to completely discount software raid; it definitely has its place. I just don't use it where extremely high performance requirements are necessary. Often, my systems have better things to do than service storage interrupts all day long and reorder IOs they don't need to reorder.

-kelsey


--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list

Reply via email to