Re: On the subject of RAID-6 corruption recovery
On Mon, 7 Jan 2008, Thiemo Nagel wrote: What you call "pathologic" cases when it comes to real-world data are very common. It is not at all unusual to find sectors filled with only a constant (usually zero, but not always), in which case your **512 becomes **1. Of course it would be easy to check how many of the 512 Bytes are really different on a case-by-case basis and correct the exponent accordingly, and only perform the recovery when the corrected probability of introducing an error is sufficiently low. What is the alternative to recovery, really? Just erroring out and letting the admin deal with it, or blindly assume that the parity is wrong? /Mattias Wadenstein - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux RAID Partition Offset 63 cylinders / 30% performance hit?
On Wed, 19 Dec 2007, Justin Piszcz wrote: -- Now to my setup / question: # fdisk -l /dev/sdc Disk /dev/sdc: 150.0 GB, 150039945216 bytes 255 heads, 63 sectors/track, 18241 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x5667c24a Device Boot Start End Blocks Id System /dev/sdc1 1 18241 146520801 fd Linux raid autodetect --- If I use 10-disk RAID5 with 1024 KiB stripe, what would be the correct start and end size if I wanted to make sure the RAID5 was stripe aligned? Or is there a better way to do this, does parted handle this situation better? From that setup it seems simple, scrap the partition table and use the disk device for raid. This is what we do for all data storage disks (hw raid) and sw raid members. /Mattias Wadenstein - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid over 48 disks
On Wed, 19 Dec 2007, Neil Brown wrote: On Tuesday December 18, [EMAIL PROTECTED] wrote: We're investigating the possibility of running Linux (RHEL) on top of Sun's X4500 Thumper box: http://www.sun.com/servers/x64/x4500/ Basically, it's a server with 48 SATA hard drives. No hardware RAID. It's designed for Sun's ZFS filesystem. So... we're curious how Linux will handle such a beast. Has anyone run MD software RAID over so many disks? Then piled LVM/ext3 on top of that? Any suggestions? There are those that have run Linux MD RAID on thumpers before. I vaguely recall some driver issues (unrelated to MD) that made it less suitable than solaris, but that might be fixed in recent kernels. Alternately, 8 6drive RAID5s or 6 8raid RAID6s, and use RAID0 to combine them together. This would give you adequate reliability and performance and still a large amount of storage space. My personal suggestion would be 5 9-disk raid6s, one raid1 root mirror and one hot spare. Then raid0, lvm, or separate filesystem on those 5 raidsets for data, depending on your needs. You get almost as much data space as with the 6 8-disk raid6s, and have a separate pair of disks for all the small updates (logging, metadata, etc), so this makes alot of sense if most of the data is bulk file access. /Mattias Wadenstein - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: limits on raid
On Thu, 21 Jun 2007, Neil Brown wrote: I have that - apparently naive - idea that drives use strong checksum, and will never return bad data, only good data or an error. If this isn't right, then it would really help to understand what the cause of other failures are before working out how to handle them In theory, that's how storage should work. In practice, silent data corruption does happen. If not from the disks themselves, somewhere along the path of cables, controllers, drivers, buses, etc. If you add in fcal, you'll get even more sources of failure, but usually you can avoid SANs (if you care about your data). Well, here is a couple of the issues that I've seen myself: A hw-raid controller returning every 64th bit as 0, no matter what's on disk. With no error condition at all. (I've also heard from a collegue about this on every 64k, but not seen that myself.) An fcal switch occasionally resetting, garbling the blocks in transit with random data. Lost a few TB of user data that way. Add to this the random driver breakage that happens now and then. I've also had a few broken filesystems due to in-memory corruption due to bad ram, not sure there is much hope of fixing that though. Also, this presentation is pretty worrying on the frequency of silent data corruption: https://indico.desy.de/contributionDisplay.py?contribId=65&sessionId=42&confId=257 /Mattias Wadenstein - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Paranoid read mode for raid5/6
Hi *, Is there way to tell md to be paranoid and verify the parity for raid5/6 on every read? I guess this would come with a (significant) performance hit, but sometimes that's not a big deal (unlike disks scrambling your data). Also, regarding data paranoia, for check/repair of a raid6, is the effort made to figure out which is the misbehaving participant device, or is parity just blindly recalculated? /Mattias Wadenstein - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Grow a RAID-6 ?
On Fri, 23 Mar 2007, Gordon Henderson wrote: Are there any plans in the near future to enable growing RAID-6 arrays by adding more disks into them? I have a 15x500GB - drive unit and I need to add another 15 drives into it... Hindsight is telling me that maybe I should have put LVM on top of the RAID-6, however, the usable 6TB it yields should have been enough for anyone... Well, if you are doubling the space, you could take this opportunity to put lvm on the new disks, move all the data, then put in the old disks as a pv, extending the lvm space. I really wouldn't recommend having a 30-disk raid6, imagine the rebuild time after a failed disk.. /Mattias Wadenstein - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux: Why software RAID?
On Wed, 23 Aug 2006, Chris Friesen wrote: Jeff Garzik wrote: But anyway, to help answer the question of hardware vs. software RAID, I wrote up a page: http://linux.yyz.us/why-software-raid.html Just curious...with these guys (http://www.bigfootnetworks.com/KillerOverview.aspx) putting linux on a PCI NIC to allow them to bypass Windows' network stack, has anyone ever considered doing "hardware" raid by using an embedded cpu running linux software RAID, with battery-backed memory? I'd expect this to be the reason why md offload support to xor engines and whatever turns up. It makes very little sense for a modern server/desktop CPU, but for the embedded ones it does. /Mattias Wadenstein - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Random Seek on Array as slow as on single disk
On Sun, 16 Jul 2006, A. Liemen wrote: Hardware Raid. http://www.areca.com.tw/products/html/pcix-sata.htm If you want to see even worse performance with bonnie, try running several parallel in sync, somewhere around 6-15 simultaneous read sessions[*] should give you rather horrible performance numbers. That said, synthetic benchmarks isn't everything. Check if it is an issue with realistic load too. In some cases there can be quite a difference. /Mattias Wadenstein [*] Exact numbers may vary, for me it was as low as 6 on a 12-disk raid6 where the throughput dropped from ~300M/s to ~30M/s, but smaller raidsets had a breakoff point up close to 8-10 reading procseses. Jeff Breidenbach schrieb: Controller: Areca ARC 1160 PCI-X 1GB Cache Those numbers are for Arica hardware raid or linux software raid? --Jeff - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Strange intermittant errors + RAID doesn't fail the disk.
On Fri, 7 Jul 2006, Neil Brown wrote: On Thursday July 6, [EMAIL PROTECTED] wrote: I suggest you find a SATA related mailing list to post this to (Look in the MAINTAINERS file maybe) or post it to linux-kernel. linux-ide couldn't help much, aside from recommending a bleeding-edge patchset which should fix a lot of things SATA: http://home-tj.org/files/libata-tj-stable/ What fixed the error, though, was exchanging one of the cables. (Just my luck, it was new and supposedly quality, ... oh well) I'm still interested in why the md code didn't fail the disk. While it was 'up' any access to the array would hang for a long time, ultimately fail and corrupt the fs to boot. When I failed the disk manually everything was fine (if degraded) again. md is very dependant on the driver doing the right thing. It doesn't do any timeouts or anything like that - it assumes the driver will. md simply trusts the return status from the drive, and fails a drive if and only if a write to the drive is reported as failing (if a read fails, md trys to over-write with good data first). Hmm.. Perhaps a bit of extra logic there might be good? If you try to re-write the failing bit with good data, try to read the recently written data back (perhaps after a bit of wait). If that still fails, then fail the disk. If it can't remember recently written data, it is clearly unsuitable for a running system. But the occasional block going bad (and getting remapped at a write) wouldn't trigger it. /Mattias Wadenstein - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5E
On Wed, 31 May 2006, Bill Davidsen wrote: Where I was working most recently some systems were using RAID5E (RAID5 with both the parity and hot spare distributed). This seems to be highly desirable for small arrays, where spreading head motion over one more drive will improve performance, and in all cases where a rebuild to the hot spare will avoid a bottleneck on a single drive. Is there any plan to add this capability? What advantage does that have over raid6? You use exactly as many drives (n+2), with the disadvantage of having to do a rebuild without parity when a drive fails and a raid failure at a double disk failure. /Mattias Wadenstein - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommendations for supported 4-port SATA PCI card ?
On Thu, 30 Mar 2006, Mark Hahn wrote: 3ware. Period. If you're going to use md, get the 8506-4 series rather than either of the 9xxx series cards. before the 9550, I never found attractive in price/performance: expensive as hell and a lot slower than MD. but the 9550 is really quite impressive... I'd still recommend 3ware for the md case, because of reliability. For my 6-disk raid a 8506-4 was actually slightly faster than two sil3114s running md raid5. The 3ware card has the additional advantage of not pushing the machine into a hard hang once per day or so when doing io over the raid layer. You're right though in that the hw raid performance was rather bad until the 9550. /Mattias Wadenstein - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Hard drive lifetime: wear from spinning up or rebooting vs running
On Sun, 5 Feb 2006, David Liontooth wrote: In designing an archival system, we're trying to find data on when it pays to power or spin the drives down versus keeping them running. Is there a difference between spinning up the drives from sleep and from a reboot? Leaving out the cost imposed on the (separate) operating system drive. Hitachi claims "5 years (Surface temperature of HDA is 45°C or less) Life of the drive does not change in the case that the drive is used intermittently." for their ultrastar 10K300 drives. I suspect that the best estimates you're going to get is from the manufacturers, if you can find the right documents (OEM specifications, not marketing blurbs). For their deskstar (sata/pata) drives I didn't find life time estimates beyond 50000 start-stop-cycles. /Mattias Wadenstein - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 16?
On Thu, 2 Feb 2006, Matthias Urlichs wrote: Hi, David Liontooth wrote: We're wondering if it's possible to run the following -- * define 4 pairs of RAID 1 with an 8-port 3ware 9500S card * the OS will see these are four normal drives * use md to configure them into a RAID 6 array Hmm. You'd have eight disks, five(!) may fail at any time, giving you two disks of capacity. Ouch. That's not "very high" redundancy, that's "insane". ;-) In your case, I'd install the eight disks as a straight 6-disk RAID6 with two spares. Four disks may fail (just not within the time it takes to reconstruct the array...), giving you four disks of capacity. Yes, but then you (probably) lose hotswap. A feature here was to use the 3ware hw raid for the raid1 pairs and use the hw-raid hotswap instead of having to deal with linux hotswap (unless both drives in a raid1-set dies). A net win, I'd say, esp. since it's far more likely that your two power supplies will both die within the time you'd need to replace one. But it's your call. If you only need reliability and can spend a few extra disks, I didn't find the setup so bad. /Mattias Wadenstein - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: multiple Sata SATAII 150, TX4 - how to tell which drive is which?headaches galore!
On Tue, 24 Jan 2006, Francois Barre wrote: So my (real) question would be : - how much a drive consumes ? how long can it be on diet ? and, on top of this : Is it possible to make the drives turn slower ? To make the heads move slower ? That would be my dream. No more heat, a 10mA consumption, no more noise... Some manufacturers do produce documentation regarding this: http://www.hitachigst.com/tech/techlib.nsf/products/Deskstar_7K250 It is the "Specification - OEM" document[s] that are interesting, in my case lately: "Deskstar 7K250 Specification v1.6 (Serial ATA)" On page 27 you have "Table 21: Power supply of current models" where you can read the power consumption of a drive during various kinds of load (startup, idle, random r/w, sleep). Here you also can see that the "Silent" mode makes the drive use 8.5W instead of 10.6W under "Random R/W Average" load. Not that I know how to activate "Silent" mode from Linux, but I expect the document to tell you what bits you should send over the cable. I suspect other manufacturers have similar documents, but I don't know if they are public or how to find them. Some pointing and clicking on websites will probably tell you what you need for your drives. /Mattias Wadenstein - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problems with multiple Promise SATA150 TX4 cards
On Wed, 25 Jan 2006, Christopher Smith wrote: Something else I tried was some crappy dual-port SIL-based SATA card with two of the Promise TX4s, and that worked without a problem. While I'm waiting to find out what this is, I might buy another one and use the two of them temporarily so I can build my RAID array, at least. This suddenly sounds familiar from my experiences with Promise cards, but during the PATA days. Two cards worked fine, but adding a third made lots of strange problems appear. So my experience matches yours, max 2 promise cards in one machine. This experience of strange and not strictly reproducable problems when it comes to both stability and drive detection made me stay away from buying more Promise cards in favour of "crappy" SIL-based SATA cards in the latest upgrade. Unfortunately, this did not help stability that much, I'm currently hunting down a reproducable hard hang when doing raid action on the drives. So far reads with dd from the individual disk devices are fine (even from all at the same time), but dd from the raid5 over the disks hangs the system in an hour or so. But more regarding this in a separate thread when I have enough info to be useful and I'm sure the numbers are repeatable (unless someone finds preliminary findings really interesting). /Mattias Wadenstein - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Selective spin-up
On Sun, 22 Jan 2006, Brad Campbell wrote: John Hendrikx wrote: I recently extended my raid array with a 9th drive, and I find that the 300 watt PSU I use is insufficient to start the system. What happens is that I activate the machine, the machine starts powering up for 3 seconds orso (spinning all the hard drives up about half way) then power cuts out. I've heard that it is possible to spin up disks one at a time or in groups, but cannot find any such option in the BIOS (Asus A8N-E board). Any ideas? And just in response to the original question, this is called "staggered spin-up" and is usually a feature of larger raid/scsi controllers and intelligent backplanes. Usually not found in cheap hardware. I had a similar problem with a 480W PSU and my 15 drive array.. I looked for selective spinup and other hacks, but to be honest I figured that even if the trip point is under spin up load only, I was running the PSU too close to the wire to be comfortable. I bought a 600W unit and re-wired it's dual 12V rails to do the job.. Problem solved. Heh, I just did this for an 18 drive machine too. Much better than the old duct-taped 2 PSU "solution": http://www.acc.umu.se/~maswan/bilder/20060117-Kryddis/ I also run a smart check every morning (6 days short 1 day long) simultaneously on all drives to make sure the system behaves under load. Do yourself a favour and grab a shiny new PSU.. I have 14 Drives hanging off a 420W unit in my other box that behave perfectly.. Just look at the the 12V rail capacity on any PSU you choose and make sure it will cope with the spool up load. I grabbed Maxtor's docs on these drives, calculated the load for 15 drives and worked from there.. My practical measurements came to within about 15% of the theoretical so the docs are not to bad. Don't forget to take a look at the 5V load too. The last batch of drives I got for that machine was hitatchi sata drives which claims 1.3A in "max r/w load". Lots of PSUs even with high W-ratings (600/800) have only 20 or 30A rated 5V load, and you want some left over for the rest of the system. Now, if I could only get rid of these load-induced sata_sil (or whatever it is) cold hangs everything would be awesome. :) /Mattias Wadenstein - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [linux-lvm] Re: more info on the hang with 2.6.15-rc5
On Mon, 9 Jan 2006, Sebastian Kuzminsky wrote: Matthew Gillen <[EMAIL PROTECTED]> wrote: Sebastian Kuzminsky wrote: Still broken in 2.6.15. With all the debugging options OFF in the config, the system stayed up < 24 hours under load, then a hard lockup like before: nothing on the console, magic sysrq doesnt work, no caps-lock, no ping. Note that this is different from before: it actually ran a little bit before locking up, rather than locking up within seconds like it did with 2.6.15-rc5. I couldn't quite tell from your description: are you getting the lockup when you try to mount a filesystem that uses the RAID+LVM as a device? Or do you get errors when doing LVM-level stuff (ie no filesystem can even be put on the device)? In the case of the former, what filesystem are you using? No errors ever, just hard lockups. Just wanted to add that I recently starting to see similar behaviour (hard lockups with no errors during heavy writing to the filesystem) on raid5+lvm+reiserfs after adding 6 sata_sil-connected disks in raid5 to an already existing vg of a couple of pata raid5s. This is 2.6.14 though, and I haven't ruled out hw issues yet (going to put a beefier PSU in it first), but it might be related. I'm going to try adding debugging symbols to make it stable after the PSU upgrade, if that doesn't fix it. :) /Mattias Wadenstein - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: First RAID Setup
On Thu, 22 Dec 2005, Bill Davidsen wrote: If you are seeing dual drive failures, I suspect your hardware has problems. We run multiple 3 and 6 TB databases, and over a dozen 1 TB data caching servers, all using a lot of small fast disk, and I haven't seen a real dual drive failure in about 8 years. We did see some cases which looked like dual failures, it turned out to be a firmware limitation, controller not waiting for the bus to settle after a real failure, and thinking the next i/o had failed (or similar, in any case a false fail on the transaction after the real fail). If you run two PATA drives on the same cable in master/slave, it's at least possible that this could happen with consumer grade hardware as well. Just a thought, dual failures are VERY unlikely unless one triggers the other in some way, like failing the bus or cabinet power supply. Not really, it depends on how lucky you are with your disks. We've had real dual-drive failures in a system with hot spares, where the second drive failed during resync. Now, we have gotten the manufacturer to replace those with another model, but some of these problems don't occur until a year of two into production use (I haven't seen quite this bad, but a high replacement rate ramping up after a while). Choosing high-end disks usually helps, but raid6 is really great in that you always have redundancy, even when replacing a failed or failing drive. If you have a large raidset with a fairly heavy load the resync time can easily extend into days, if not weeks. With raid5, during that entire period, if one more drive fails you're screwed. Btw, in our practical usage, we haven't seen that big a difference between raid5 and rad6, but I guess that depends on your usage pattern. /Mattias Wadenstein - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: random lockups, raid problems SOLVED (plus a question)
On Tue, 6 Dec 2005, Michael Stumpf wrote: Sure it's a FAQ. It's probably even documented. And, I know it, but it still surprised me. Such is life: 2/3 sticks of perfectly good ECC ram in an old server class p3 board apparently have gone bad. Result? Random lockups/reboots with nothing in the system logs to even lend a clue. Memtest86 showed one problem immediately, and after some time, exposed some more. Remove the bad memory and it works fine. Is there some daemon that can more actively monitor memory function? I must have had this problem for months, but with sputtering hard drives that were slowly dying and causing very similar problems, this diagnosis got muddled. 1. Run memtest if you experience instability. 2. Use a system that supports ecc and enable it in bios, that way you are likely to get a proper machine fault instead of lockups etc. Note though, that it is very common for bugs in these systems, our last 3 clusters over the last 4 years have all had errata on bios showing ecc to be enabled but not actually having it enabled. /Mattias Wadenstein - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html