RAID5 single-redundancy (N-1 out of N) has its risks when the capacity
to write-speed ratios are as high as they have been in the past 5-10
years. Your problem of 2nd failure before recovery is the main issue.
See "RAID6" in http://en.wikipedia.org/wiki/Non-standard_RAID_levels
Solutions are usually double redundancy (2 of 4) or striping mirrored
raid (stripe over two arrays of 2 mirrored discs) also known as RAID1+0
in line above.
None of which will help you now.
Right now, you need luck and patience or the constitution to say goodbye
to your data. Keep trying to bring your raid back up with redundancy.
(I'd give up after 3 failed rebuilds) Then you need to re-sun smartctl
self tests and replace anything that looks bad. Then, move to a double
redundancy setup. You may need more drives.
There really isn't much other choice. There are data recovery services
in town used by police and the like. Last I read it was $300 per HD.
Not sure if they can rebuild linux software raid5 arrays. But maybe.
I've moved away from Linux RAID because I don't have the time for this
stuff anymore. I got a Drobo 5N was able to move my discs in one at a
time after copying the data in. Drobo Apps exist for ssh, rsync and
nfs. Not the fastest thing in the world. But fast enough to play 1080p
h.264 videos with 5.1 AAC which is really the main work-load for this
device. It's a toaster, I set it and forget it.
On 05/12/14 12:39, Emery Guevremont wrote:
Hello,
I haven't written to this list in a while, the situation I put myself
in requires me to ask for some help from this mailing list.
The short story here is I have a 4 disk raid 5 array, where one of my
drives died. My array became degraded, I shutdown my home server until
I received a new replacement drive. Upon receiving my replacement
drive, I used it to replaced the broken drive and booted into single
user mode to resync my RAID5 array. At about the 12% mark, another
returned a read error and mdadm marked that drive to failed,
essentially leaving me with a spare, a failed drive and only 2 drive
out 4 in the array. I know, my chances are bleek to recover from this,
but I still believe there's hope. Forget about backups, only half of
my important files are backed up. I was in the middle of building a
bigger NAS to backup my home server.
Long story and what I've done.
/dev/md0 is assembled with 4 drives
/dev/sda3
/dev/sdb3
/dev/sdc3
/dev/sdd3
2 weeks ago, mdadm marked /dev/sda3 as failed. cat /proc/mdstat showed
_UUU. smarctl also confirmed that the drive was dying. So I shutdown
the server and until I received a replacement drive.
This week, I replaced the dying drive with my new drive. Booted into
single user mode and did this:
mdadm --manage /dev/md0 --add /dev/sda3 a cat of /proc/mdstat
confirmed the resyncing process. The last time I checked it was up to
11%. After a few minutes later, I noticed that the syncing stopped. An
read error message on /dev/sdd3 (have a pic of it if interested)
appear on the console. It appears that /dev/sdd3 might be going bad. A
cat /proc/mdstat showed _U_U. Now I panic, and decide to leave
everything as is and to go to bed.
The next day, I shutdown the server and reboot with a live usb distro
(Ubuntu rescue remix). After booting into the live distro, a cat
/proc/mdstat showed that my /dev/md0 was detected but all drives had
an (S) next to it. i.e. /dev/sda3 (S)... Naturally I don't like the
looks of this.
I ran ddrescue to copy /dev/sdd onto my new replacement disk
(/dev/sda). Everything, worked, ddrescue got only one read error, but
was eventually able to read the bad sector on a retry.
Tonight I plan to repeat this procedure with ddrescue and to clone
/dev/sdb and /dev/sdc.
Now is where I need your help. How should I got about to try and
rebuild my array? I will be using the cloned drives to do this. My
goal is to simply assemble my raid array in degraded state with sdb3,
sdc3 and sdd3, mount /dev/md0 and backup as many files as I can. From
the various google searches, I was going to first try a:
mdadm --assemble --scan #which I expect to not work.
mdadm --assemble --force #still not quite sure about the syntax and if
ordering is important.
I've also seen people using the --assume-clean option.
But what commands should I do. Since if I mess-up, starting over, by
recloning my drives is a time consuming step.
_______________________________________________
mlug mailing list
[email protected]
https://listes.koumbit.net/cgi-bin/mailman/listinfo/mlug-listserv.mlug.ca
--
Jean-Luc Cooke
+1-613-263-2983
_______________________________________________
mlug mailing list
[email protected]
https://listes.koumbit.net/cgi-bin/mailman/listinfo/mlug-listserv.mlug.ca