Re: failed raid1 drive

Craig Falconer Wed, 28 Oct 2009 15:06:03 -0700

Roger Searle wrote, On 29/10/09 10:47:

Craig Falconer wrote:
Then two ways to progress....
0    Boot in single user mode
1 Add one new drive to the machine, partition it with similar butlarger partitions as appropriate.
2    Then use
        mdadm --add /dev/md3 /dev/sdb4
        mdadm --add /dev/md2 /dev/sdb3
        mdadm --add /dev/md1 /dev/sdb2
        mdadm --add /dev/md0 /dev/sdb1
        sysctl -w dev.raid.speed_limit_max=9999999
3    While this is happening run
        watch --int 10 cat /proc/mdstat
    Wait until all the drives are synched
4 If you boot off this raidset you'll need to reinstall a bootloader on each drive
5    Down the machine and remove the last 320 GB drive.
6    Install the other new drive, then boot.
7    Partition the other new drive the same as the first big drive
8    Repeat steps 2 and 3 but use sda rather than sdb
Once they're finished synching you can grow your filesystems totheir full available space
9    Do the boot loader install onto both drives again
10    Then you can reboot and it should all be good.
I have a new drive installed, partitioned and formatted, ready to add tothe raidset, first some questions related to the above, to ease my mindbefore proceeding.Is it necessary to boot to single user mode (and why?) since this willmake the machine unavailable to the network as a file server for theduration of the process? Machine is used solely to serve up files.Based on the time it took to re-add the drive last week, it would needto go offline for some hours, and therefore means a very late (startand) finish to a work day or needing to be at a weekend to keep itavailable to users during working days.

You're right - single user is not necessary. The only real reason fordoing that is so that files aren't changed on your only disk, and thensome failure before the synch has completed.


BTW I did this last night on a live box and it worked fine.

From my reading of man mdadm, it suggests doing a fail and remove ofthe faulty drive, possibly at the same time as adding a new device, like:
mdadm /dev/md0 --add /dev/sda1 --fail /dev/sdb1 --remove /dev/sdb1

Is this a good process to follow or is it redundant/unnecessary?

Sounds silly actually - remove the only good drive as you add the blankone?

Just in case I run into issues reinstalling the boot loader from a liveCD, I understand that I would (as an interim measure) be able to bootthe machine with a single partition marked as bootable from just thecurrent good drive by disconnecting the new drive?

As long as the good drive is bootable it will be fine. I had an issuewhere the boot loader was only on the second drive of a raid1, but themachine was fine until that second drive gave out. The first drive thenwasn't bootable.


You will want something like this for grub:

# grub --batch --no-floppy

then type in

root (hd0,0)
setup (hd0)
root (hd1,0)
setup (hd1)
quit

Finally, I'm somewhat unclear how the resulting partitions are going towork out, current failing drive is /dev/sdb, /dev/sdc holds backups, newlarger drive comes up as /dev/sdd. Surely once sdb is physically removedsdc and sdd move up a letter and this messes with adding to the raidarray as sdd? Or, is a better approach to do a fail & remove of thefailing drive, physically remove it and put the new drive on the samesata connector?


Check your dmesg output for things like
md:  adding sda5 ...
md: sda3 has different UUID to sdb5
md: sda2 has different UUID to sdb5
md: sda1 has different UUID to sdb5
md: created md1

As long as the partition type is FD then the kernel will try to use itto assemble a raid device.



--
Craig Falconer

Re: failed raid1 drive

Reply via email to