Re: failed raid1 drive

Roger Searle Thu, 22 Oct 2009 17:04:11 -0700

Hi everyone, thanks for the great replies, I have a couple of disksarriving soon and am glad of being able to plan when to do the fixrather than having to drop everything - and maybe a long day.

I've re-added the drives back in and they are spending a couple of hoursdoing the recovery thing. However that's more for interest thananything else, they will both be replaced, one at a time.

Main nervousness was around reinstalling grub, though it seems that thisis actually a quite straight forward process, according to pages such assummarised nicely in the following:


http://flog.cruzn.net.au/articles/restore-grub.shtml

and

http://ubuntuforums.org/showthread.php?t=224351

though please feel free to comment on their usefulness.
Roger



Craig Falconer wrote:

I agree with Steve - your best bet is to treat this as a warning andexpect to replace both drives in the near future.
<!>  Why is md2 still a valid raidset?  that's odd.
The RAID has succeeded in allowing you to plan this change, ratherthan losing access to your files.
Since its a software raid, I suggest you plan on buying two new largerdrives.
First try Steve's idea and just readd the drive to the array.
It shouldn't hurt anything.

Then two ways to progress....
0    Boot in single user mode
1 Add one new drive to the machine, partition it with similar butlarger partitions as appropriate.
2    Then use
        mdadm --add /dev/md3 /dev/sdb4
        mdadm --add /dev/md2 /dev/sdb3
        mdadm --add /dev/md1 /dev/sdb2
        mdadm --add /dev/md0 /dev/sdb1
        sysctl -w dev.raid.speed_limit_max=99999999
3    While this is happening run
        watch --int 10 cat /proc/mdstat
    Wait until all the drives are synched
4 If you boot off this raidset you'll need to reinstall a bootloader on each drive
5    Down the machine and remove the last 320 GB drive.
6    Install the other new drive, then boot.
7    Partition the other new drive the same as the first big drive
8    Repeat steps 2 and 3 but use sda rather than sdb
Once they're finished synching you can grow your filesystems totheir full available space
9    Do the boot loader install onto both drives again
10    Then you can reboot and it should all be good.

The other way of doing it is to:
1    Add both new drives
2    Create a new single md device
3 Create a PV and add it to a VG then create individual LVs aslarge as you want. Leave some spare space and you can grow individualLVs later4 Use something like rsync to copy the files from the old md to thenew LVs
5    Enjoy.
The old 320 that still works can be relegated to a windows box orsomething else thats unimportant. It could work for another 10 yearsor it could fail tomorrow... who would know.
steve wrote, On 22/10/09 16:26:
On Thu, 2009-10-22 at 16:00 +1300, Roger Searle wrote:
Hi, I noticed by chance that I have a failed drive in a raid1 arrayon a file server that I need to replace, and seeking some guidanceor confirmation of being on the right track to resolve this. Seemsfrom the failure of more than 1 partition I will be needing to buy anew disk rather than any repair option being possible, I may as wellget a new pair but replace the failed disk first, then when resolvedreplace the other. Yes, I have backups of the valuable data onother drives both in the same machine (not in this array) andelsewhere. And I then need to set up better monitoring because thefailure began a few weeks ago. But for now...
There are so many levels of electronics that you go through to get to
the platter there days that if you see even a single hard error, then
now's a good time to use it only for skeet shooting...
The failed disk is 320GB, and contain (mirrored) /, home, and swap.Presumably I could buy much larger disks, and need to repartitionprior to adding it back into the array?
Best to use the same make/model of disk if possible. Speed differences
between the two can make it unreliable ( that's an exaggeration, but you
know what I mean ).
The partitions should be at least the same size but could be muchlarger without any problem?
If you want to add more space, then I'd buy a new pair of bigger disks,
create a new set, and copy everything across. Reason I'm saying that is
that your 2 existing disks are probably exactly the same make/model,
with similar serial numbers??? Guess what's going to fail next (:

Last time I looked, 1TB was around the $125 mark.
There is some configuration data in mdadm.conf including UUIDs ofthe arrays, and this doesn't match with the UUIDs in fstab, do Ineed to be concerned about this sort of thing and can just use mdadmor other tools to rebuild the arrays and that will update anyrelevant config files?
mdadm.conf is pretty redundant I think. They tend to be automagically
configured at boot time these days. Building a new raid array *should*
add the correct data to it. Although I have a grand old time with a
hardy server of mine in this respect.
Is there anything else I should be looking out for or preparing?
Don't forget to add a bootstrap to each new disk if this is going to
contain the boot partition as well.
Thanks for any pointers anyone may care to share.
You could try
mdadm --add /dev/md3 /dev/sdb4

and see whether it resilvers. Looking in dmesg for hard errors is the
best place.

hth,

Steve
A couple of examples of DegradedArray and Fail Event emails to rootrecently follow:
To: [email protected]
Subject: Fail event on /dev/md1:jupiter
Date: Wed, 21 Oct 2009 17:42:49 +1300

This is an automatically generated mail message from mdadm
running on jupiter

A Fail event had been detected on md device /dev/md1.

It could be related to component device /dev/sdb2.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5][raid4] [raid10]
md3 : active raid1 sda4[0]
      290977216 blocks [2/1] [U_]
     md2 : active raid1 sda3[0] sdb3[1]
      104320 blocks [2/2] [UU]
     md1 : active raid1 sda2[0] sdb2[2](F)
      1951808 blocks [2/1] [U_]
     md0 : active raid1 sda1[0]
      19534912 blocks [2/1] [U_]
     unused devices: <none>



Subject: DegradedArray event on /dev/md3:jupiter
Date: Wed, 07 Oct 2009 08:26:49 +1300

This is an automatically generated mail message from mdadm
running on jupiter

A DegradedArray event had been detected on md device /dev/md3.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5][raid4] [raid10]
md3 : active raid1 sda4[0]
      290977216 blocks [2/1] [U_]
     md2 : active raid1 sda3[0] sdb3[1]
      104320 blocks [2/2] [UU]
     md1 : active raid1 sda2[0] sdb2[1]
      1951808 blocks [2/2] [UU]
     md0 : active raid1 sda1[0]
      19534912 blocks [2/1] [U_]
     unused devices: <none>

Cheers,
Roger

Re: failed raid1 drive

Reply via email to