Dan,

Copy from my backup?  But, it's RAID!  I don't need a backup!

;-)

You appear to be right on the money, which sort of annoys me.  :-)
The whole point of RAID (particularly RAID 5) is it's supposed to
bring a level of reliability the system.  If a disk fails, data isn't
lost.  Bringing the whole system to a hard locked crashing death is
hardly "reliable", unless you count the fact that it did reliably lock
up solid at exactly the same point of the resync each time.  I suppose
technically I didn't lose any data, because I was able to copy files
off the array during that time from boot up until it hit the magical
40.1% (it restarted the sync from zero each time that happened), and
could do it over and over after each restart as often as it took me to
make sure everything was safely backed up (not that I cared - I never
fully trusted the thing anyway, so I'd used it as my junk space,
things I've downloaded, saved somewhere else, backed up to CD/DVD sort
of space, so nothing of value would have been lost anyway even if I
had no backup of it).  There were only two files I couldn't copy off,
which I guess must have been sitting on the bad place on the bad disk,
because trying to copy them caused the system to lock up instantly
during the copy.

But back to you being right.  First, I bought a new drive, completely
removed all the partitions from all the drives and started from
scratch.  Of course I didn't get the bad drive on the first try, so I
put the whole array together with one existing drive replaced by the
new one, and watched it die promptly at 40.1% of the resync.  But I
got it on the second try and watched it happily rebuild all the way to
100%.  So, clearly it's a drive.  To put your idea of spare drives to
the test, I rebuilt the array again, with 3 active and one spare
drive, thus including the bad drive in the setup.  And wouldn't you
know, it rebuilt the array, and flagged the bad drive as faulty in the
process rather than just falling over dead.  How nice.  Actually it
flagged two as spare, one (the bad one) as "faulty spare", and left
only one disk active in the RAID 5 array, which makes no sense at all,
but at least it proves out that it could find the faulty drive given
the chance.  It even logged a ton of error messages to
/var/log/messages rather than just locking up with no feedback.

So, there's the lesson for the day, I guess.  When running a RAID 5
with software RAID, put a spare drive in the setup to catch such a
event as a failed disk.  I wouldn't have thought it was necessary, but
in this case it seems it is.

Thanks for the guidance, Dan.  You are a guru.  :-)

Ian

2009/1/3 Dan Graham <[email protected]>
>
> Hi Ian,
>
> I have seen this happen when you create an mdadm RAID5 array without a
> hot spare drive (4th disk). When a drive in the array fails with only
> 3 disks it cannot rebuild itself without the hot spare. You may be
> able to add an additional disk to the array and then try rebuilding it
> but it will take far less time to create an entirely new array and
> copy your backup data to it.
>
> All the best, Dan
>

_______________________________________________
clug-talk mailing list
[email protected]
http://clug.ca/mailman/listinfo/clug-talk_clug.ca
Mailing List Guidelines (http://clug.ca/ml_guidelines.php)
**Please remove these lines when replying

Reply via email to