I agree with Steve - your best bet is to treat this as a warning and
expect to replace both drives in the near future.
<!> Why is md2 still a valid raidset? that's odd.
The RAID has succeeded in allowing you to plan this change, rather than
losing access to your files.
Since its a software raid, I suggest you plan on buying two new larger
drives.
First try Steve's idea and just readd the drive to the array.
It shouldn't hurt anything.
Then two ways to progress....
0 Boot in single user mode
1 Add one new drive to the machine, partition it with similar but
larger partitions as appropriate.
2 Then use
mdadm --add /dev/md3 /dev/sdb4
mdadm --add /dev/md2 /dev/sdb3
mdadm --add /dev/md1 /dev/sdb2
mdadm --add /dev/md0 /dev/sdb1
sysctl -w dev.raid.speed_limit_max=99999999
3 While this is happening run
watch --int 10 cat /proc/mdstat
Wait until all the drives are synched
4 If you boot off this raidset you'll need to reinstall a boot loader on
each drive
5 Down the machine and remove the last 320 GB drive.
6 Install the other new drive, then boot.
7 Partition the other new drive the same as the first big drive
8 Repeat steps 2 and 3 but use sda rather than sdb
Once they're finished synching you can grow your filesystems to their
full available space
9 Do the boot loader install onto both drives again
10 Then you can reboot and it should all be good.
The other way of doing it is to:
1 Add both new drives
2 Create a new single md device
3 Create a PV and add it to a VG then create individual LVs as large as
you want. Leave some spare space and you can grow individual LVs later
4 Use something like rsync to copy the files from the old md to the new
LVs
5 Enjoy.
The old 320 that still works can be relegated to a windows box or
something else thats unimportant. It could work for another 10 years or
it could fail tomorrow... who would know.
steve wrote, On 22/10/09 16:26:
On Thu, 2009-10-22 at 16:00 +1300, Roger Searle wrote:
Hi, I noticed by chance that I have a failed drive in a raid1 array on a
file server that I need to replace, and seeking some guidance or
confirmation of being on the right track to resolve this. Seems from
the failure of more than 1 partition I will be needing to buy a new disk
rather than any repair option being possible, I may as well get a new
pair but replace the failed disk first, then when resolved replace the
other. Yes, I have backups of the valuable data on other drives both in
the same machine (not in this array) and elsewhere. And I then need to
set up better monitoring because the failure began a few weeks ago. But
for now...
There are so many levels of electronics that you go through to get to
the platter there days that if you see even a single hard error, then
now's a good time to use it only for skeet shooting...
The failed disk is 320GB, and contain (mirrored) /, home, and swap.
Presumably I could buy much larger disks, and need to repartition prior
to adding it back into the array?
Best to use the same make/model of disk if possible. Speed differences
between the two can make it unreliable ( that's an exaggeration, but you
know what I mean ).
The partitions should be at least the same size but could be much larger
without any problem?
If you want to add more space, then I'd buy a new pair of bigger disks,
create a new set, and copy everything across. Reason I'm saying that is
that your 2 existing disks are probably exactly the same make/model,
with similar serial numbers??? Guess what's going to fail next (:
Last time I looked, 1TB was around the $125 mark.
There is some configuration data in mdadm.conf including UUIDs of the
arrays, and this doesn't match with the UUIDs in fstab, do I need to be
concerned about this sort of thing and can just use mdadm or other tools
to rebuild the arrays and that will update any relevant config files?
mdadm.conf is pretty redundant I think. They tend to be automagically
configured at boot time these days. Building a new raid array *should*
add the correct data to it. Although I have a grand old time with a
hardy server of mine in this respect.
Is there anything else I should be looking out for or preparing?
Don't forget to add a bootstrap to each new disk if this is going to
contain the boot partition as well.
Thanks for any pointers anyone may care to share.
You could try
mdadm --add /dev/md3 /dev/sdb4
and see whether it resilvers. Looking in dmesg for hard errors is the
best place.
hth,
Steve
A couple of examples of DegradedArray and Fail Event emails to root
recently follow:
To: [email protected]
Subject: Fail event on /dev/md1:jupiter
Date: Wed, 21 Oct 2009 17:42:49 +1300
This is an automatically generated mail message from mdadm
running on jupiter
A Fail event had been detected on md device /dev/md1.
It could be related to component device /dev/sdb2.
Faithfully yours, etc.
P.S. The /proc/mdstat file currently contains the following:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md3 : active raid1 sda4[0]
290977216 blocks [2/1] [U_]
md2 : active raid1 sda3[0] sdb3[1]
104320 blocks [2/2] [UU]
md1 : active raid1 sda2[0] sdb2[2](F)
1951808 blocks [2/1] [U_]
md0 : active raid1 sda1[0]
19534912 blocks [2/1] [U_]
unused devices: <none>
Subject: DegradedArray event on /dev/md3:jupiter
Date: Wed, 07 Oct 2009 08:26:49 +1300
This is an automatically generated mail message from mdadm
running on jupiter
A DegradedArray event had been detected on md device /dev/md3.
Faithfully yours, etc.
P.S. The /proc/mdstat file currently contains the following:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md3 : active raid1 sda4[0]
290977216 blocks [2/1] [U_]
md2 : active raid1 sda3[0] sdb3[1]
104320 blocks [2/2] [UU]
md1 : active raid1 sda2[0] sdb2[1]
1951808 blocks [2/2] [UU]
md0 : active raid1 sda1[0]
19534912 blocks [2/1] [U_]
unused devices: <none>
Cheers,
Roger
--
Craig Falconer
The Total Team - Managed Systems
Office: 0800 888 326 / +643 974 9128
Email: [email protected]
Web: http://www.totalteam.co.nz/