On Jun 28, 2019, at 8:46 AM, Blake Hudson <bl...@ispn.net> wrote:
> 
> Linux software RAID…has only decreased availability for me. This has been due 
> to a combination of hardware and software issues that are are generally 
> handled well by HW RAID controllers, but are often handled poorly or 
> unpredictably by desktop oriented hardware and Linux software.

Would you care to be more specific?  I have little experience with software 
RAID, other than ZFS, so I don’t know what these “issues” might be.

I do have a lot of experience with hardware RAID, and the grass isn’t very 
green on that side of the fence, either.  Some of this will repeat others’ 
points, but it’s worth repeating, since it means they’re not alone in their 
pain:


0. Hardware RAID is a product of the time it was produced.  My old parallel IDE 
and SCSI RAID cards are useless because you can’t get disks with that port type 
any more; my oldest SATA and SAS RAID cards can’t talk to disks bigger than 2 
TB; and of those older hardware RAID cards that still do work, they won’t 
accept a RAID created by a controller of another type, even if it’s from the 
same company.  (Try attaching a 3ware 8000-series RAID to a 3ware 9000-series 
card, for example.)

Typical software RAID never drops backwards compatibility.  You can always 
attach an old array to new hardware.  Or even new arrays to old hardware, 
within the limitations of the hardware, and those limitations aren’t the 
software RAID’s fault.


1. Hardware RAID requires hardware-specific utilities.  Many hardware RAID 
systems don’t work under Linux at all, and of of those that do, not all provide 
sufficiently useful Linux-side utilities.  If you have to reboot into the RAID 
BIOS to fix anything, that’s bad for availability.


2. The number of hardware RAID options is going down over time.  Adaptec’s 
almost out of the game, 3ware was bought by LSI and then had their products all 
but discontinued, and most of the other options you list are rebadged LSI or 
Adaptec.  Eventually it’s going to be LSI or software RAID, and then LSI will 
probably get out of the game, too.  This market segment is dying because 
software RAID no longer has any practical limitations that hardware can fix.


3. When you do get good-enough Linux-side utilities, they’re often not 
well-designed.  I don’t know anyone who likes the megaraid or megacli64 
utilities.  I have more experience with 3ware’s tw_cli, and I never developed 
facility with it beyond pidgin, so that to do anything even slightly uncommon, 
I have to go back to the manual to piece the command together, else risk 
roaching the still-working disks.

By contrast, I find the zfs and zpool commands well-designed and easy to use.  
There’s no mystery why that should be so: hardware RAID companies have their 
expertise in hardware, not software.  Also, “man zpool” doesn’t suck. :)

That coin does have an obverse face, which is that young software RAID systems 
go through a phase where they have to re-learn just how false, untrustworthy, 
unreliable, duplicitous, and mendacious the underlying hardware can be.  But 
that expertise builds up over time, so that a mature software RAID system copes 
quite well with the underlying hardware’s failings.

The inverse expertise in software design doesn’t build up on the hardware RAID 
side.  I assume this is because they fire the software teams once they’ve 
produced a minimum viable product, then re-hire a new team when their old 
utilities and monitoring software gets so creaky that it has to be rebuilt from 
scratch.  Then you get a *new* bag of ugliness in the world.

Software RAID systems, by contrast, evolve continuously, and so usually tend 
towards perfection.

The same problem *can* come up in the software RAID world: witness how much 
wheel reinvention is going on in the Stratis project!  The same amount of 
effort put into ZFS would have been a better use of everyone’s time.

That option doesn’t even exist on the hardware RAID side, though.  Every 
hardware RAID provider must develop their command line utilities and monitoring 
software de novo, because even if the Other Company open-sourced its software, 
that other software can’t work with their proprietary hardware.


4. Because hardware RAID is abstracted below the OS layer, the OS and 
filesystem have no way to interact intelligently with it.

ZFS is at the pinnacle of this technology here, but CentOS is finally starting 
to get this through Stratis and the extensions Stratis has required to XFS and 
LVM.  I assume btrfs also provides some of these benefits, though that’s on 
track to becoming off-topic here.

ZFS can tell you which file is affected by a block that’s bad across enough 
disks that redundancy can’t fix it.  This gives you a new, efficient, recovery 
option: restore that file from backup or delete it, allowing the underlying 
filesystem to rewrite the bad block on all disks.  With hardware RAID, fixing 
this requires picking one disk as the “real” copy and telling the RAID card to 
blindly rewrite all the other copies.

Another example is resilvering: because a hardware RAID has no knowledge of the 
filesystem, a resilver during disk replacement requires rewriting the entire 
disk, which takes 8-12 hours these days.  If the volume has a lot of free 
space, a filesystem-aware software RAID resilver can copy only the blocks 
containing user data, greatly reducing recovery time.

Anecdotally, I can tell you that the ECCs involved in NAS-grade SATA hardware 
aren’t good enough on their own.  We had a ZFS server that would detect about 
4-10 kB of bad data on one disk in the pool during every weekend scrub.  We 
never figured out whether the problem was in the disk, its drive cage slot, or 
its cabling, but it was utterly repeatable.  But also utterly unimportant to 
diagnose, because ZFS kept fixing the problem for us, automatically!

The thing is, we’d have never known about this underlying hardware fault if 
ZFS’s 128-bit checksums weren’t able to reduce the chances of undetected error 
to practically-impossible levels.  Since ZFS knows, by those same 128-bit 
hashes, which copy of the data is uncorrupted, it fixed it automatically for us 
each time for years on end.  I doubt any hardware RAID system you favor would 
have fared as well.

*That’s* uptime. :)


5. Hardware RAID made sense back when a PC motherboard rarely had more than 2 
hard disk controller ports, and those were shared a single IDE lane.  In those 
days, CPUs were slow enough that calculating parity was really costly, and hard 
drives were small enough that 8+ disk arrays were often required just to get 
enough space.

Now that you can get 10+ SATA ports on a mobo, parity calculation costs only a 
tiny slice of a single core in your multicore CPU, and a mirrored pair of 
multi-terabyte disks is often plenty of space, hardware RAID is increasingly 
being pushed to the margins of the server world.

Software RAID doesn’t have port count limits at all.  With hardware RAID, I 
don’t buy a 4-port card when a 2-port card will do, because that costs me 
$100-200 more.  With software RAID, I can usually find another place to plug in 
a drive temporarily, and that port was “free” because it came with the PC.

This matters when I have to replace a disk in my hardware RAID mirror, because 
now I’m out of ports.  I have to choose one of the disks to drop out of the 
array, losing all redundancy before the recovery even starts, because I need to 
free up one of the two hardware connectors for the new disk.

That’s fine when the disk I’m replacing is dead, dead, dead, but that isn’t 
usually the case in my experience.  Instead, the disk I’m replacing is merely 
*dying*, and I’m hoping to get it replaced before it finally dies.

What that means in practice is that with software RAID, I can have an internal 
mirror, then temporarily connect a replacement drive in a USB or Thunderbolt 
disk enclosure.  Now the resilver operation proceeds with both original disks 
available, so that if we find that the “good” disk in the original mirror has a 
bad sector, too, the software RAID system might find that it can pull a good 
copy from the “bad” disk, saving the whole operation.

Only once the resilver is complete do I have to choose which disk to drop out 
of the array in a software RAID system.  If I choose incorrectly, the software 
RAID stops work and lets me choose again.

With hardware RAID, if I choose incorrectly, it’s on the front end of the 
operation instead, so I’ll end up spending 8-12 hours to create a redundant 
copy of “Wrong!”


Bottom line: I will not shed a tear when my last hardware RAID goes away.
_______________________________________________
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos

Reply via email to