On Thu, 2009-10-22 at 16:00 +1300, Roger Searle wrote:
Hi, I noticed by chance that I have a failed drive in a raid1 array
on a file server that I need to replace, and seeking some guidance
or confirmation of being on the right track to resolve this. Seems
from the failure of more than 1 partition I will be needing to buy a
new disk rather than any repair option being possible, I may as well
get a new pair but replace the failed disk first, then when resolved
replace the other. Yes, I have backups of the valuable data on
other drives both in the same machine (not in this array) and
elsewhere. And I then need to set up better monitoring because the
failure began a few weeks ago. But for now...
There are so many levels of electronics that you go through to get to
the platter there days that if you see even a single hard error, then
now's a good time to use it only for skeet shooting...
The failed disk is 320GB, and contain (mirrored) /, home, and swap.
Presumably I could buy much larger disks, and need to repartition
prior to adding it back into the array?
Best to use the same make/model of disk if possible. Speed differences
between the two can make it unreliable ( that's an exaggeration, but you
know what I mean ).
The partitions should be at least the same size but could be much
larger without any problem?
If you want to add more space, then I'd buy a new pair of bigger disks,
create a new set, and copy everything across. Reason I'm saying that is
that your 2 existing disks are probably exactly the same make/model,
with similar serial numbers??? Guess what's going to fail next (:
Last time I looked, 1TB was around the $125 mark.
There is some configuration data in mdadm.conf including UUIDs of
the arrays, and this doesn't match with the UUIDs in fstab, do I
need to be concerned about this sort of thing and can just use mdadm
or other tools to rebuild the arrays and that will update any
relevant config files?
mdadm.conf is pretty redundant I think. They tend to be automagically
configured at boot time these days. Building a new raid array *should*
add the correct data to it. Although I have a grand old time with a
hardy server of mine in this respect.
Is there anything else I should be looking out for or preparing?
Don't forget to add a bootstrap to each new disk if this is going to
contain the boot partition as well.
Thanks for any pointers anyone may care to share.
You could try
mdadm --add /dev/md3 /dev/sdb4
and see whether it resilvers. Looking in dmesg for hard errors is the
best place.
hth,
Steve
A couple of examples of DegradedArray and Fail Event emails to root
recently follow:
To: [email protected]
Subject: Fail event on /dev/md1:jupiter
Date: Wed, 21 Oct 2009 17:42:49 +1300
This is an automatically generated mail message from mdadm
running on jupiter
A Fail event had been detected on md device /dev/md1.
It could be related to component device /dev/sdb2.
Faithfully yours, etc.
P.S. The /proc/mdstat file currently contains the following:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md3 : active raid1 sda4[0]
290977216 blocks [2/1] [U_]
md2 : active raid1 sda3[0] sdb3[1]
104320 blocks [2/2] [UU]
md1 : active raid1 sda2[0] sdb2[2](F)
1951808 blocks [2/1] [U_]
md0 : active raid1 sda1[0]
19534912 blocks [2/1] [U_]
unused devices: <none>
Subject: DegradedArray event on /dev/md3:jupiter
Date: Wed, 07 Oct 2009 08:26:49 +1300
This is an automatically generated mail message from mdadm
running on jupiter
A DegradedArray event had been detected on md device /dev/md3.
Faithfully yours, etc.
P.S. The /proc/mdstat file currently contains the following:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md3 : active raid1 sda4[0]
290977216 blocks [2/1] [U_]
md2 : active raid1 sda3[0] sdb3[1]
104320 blocks [2/2] [UU]
md1 : active raid1 sda2[0] sdb2[1]
1951808 blocks [2/2] [UU]
md0 : active raid1 sda1[0]
19534912 blocks [2/1] [U_]
unused devices: <none>
Cheers,
Roger