Dan, Copy from my backup? But, it's RAID! I don't need a backup!
;-) You appear to be right on the money, which sort of annoys me. :-) The whole point of RAID (particularly RAID 5) is it's supposed to bring a level of reliability the system. If a disk fails, data isn't lost. Bringing the whole system to a hard locked crashing death is hardly "reliable", unless you count the fact that it did reliably lock up solid at exactly the same point of the resync each time. I suppose technically I didn't lose any data, because I was able to copy files off the array during that time from boot up until it hit the magical 40.1% (it restarted the sync from zero each time that happened), and could do it over and over after each restart as often as it took me to make sure everything was safely backed up (not that I cared - I never fully trusted the thing anyway, so I'd used it as my junk space, things I've downloaded, saved somewhere else, backed up to CD/DVD sort of space, so nothing of value would have been lost anyway even if I had no backup of it). There were only two files I couldn't copy off, which I guess must have been sitting on the bad place on the bad disk, because trying to copy them caused the system to lock up instantly during the copy. But back to you being right. First, I bought a new drive, completely removed all the partitions from all the drives and started from scratch. Of course I didn't get the bad drive on the first try, so I put the whole array together with one existing drive replaced by the new one, and watched it die promptly at 40.1% of the resync. But I got it on the second try and watched it happily rebuild all the way to 100%. So, clearly it's a drive. To put your idea of spare drives to the test, I rebuilt the array again, with 3 active and one spare drive, thus including the bad drive in the setup. And wouldn't you know, it rebuilt the array, and flagged the bad drive as faulty in the process rather than just falling over dead. How nice. Actually it flagged two as spare, one (the bad one) as "faulty spare", and left only one disk active in the RAID 5 array, which makes no sense at all, but at least it proves out that it could find the faulty drive given the chance. It even logged a ton of error messages to /var/log/messages rather than just locking up with no feedback. So, there's the lesson for the day, I guess. When running a RAID 5 with software RAID, put a spare drive in the setup to catch such a event as a failed disk. I wouldn't have thought it was necessary, but in this case it seems it is. Thanks for the guidance, Dan. You are a guru. :-) Ian 2009/1/3 Dan Graham <[email protected]> > > Hi Ian, > > I have seen this happen when you create an mdadm RAID5 array without a > hot spare drive (4th disk). When a drive in the array fails with only > 3 disks it cannot rebuild itself without the hot spare. You may be > able to add an additional disk to the array and then try rebuilding it > but it will take far less time to create an entirely new array and > copy your backup data to it. > > All the best, Dan > _______________________________________________ clug-talk mailing list [email protected] http://clug.ca/mailman/listinfo/clug-talk_clug.ca Mailing List Guidelines (http://clug.ca/ml_guidelines.php) **Please remove these lines when replying

