RAID5 single-redundancy (N-1 out of N) has its risks when the capacity to write-speed ratios are as high as they have been in the past 5-10 years. Your problem of 2nd failure before recovery is the main issue. See "RAID6" in http://en.wikipedia.org/wiki/Non-standard_RAID_levels

Solutions are usually double redundancy (2 of 4) or striping mirrored raid (stripe over two arrays of 2 mirrored discs) also known as RAID1+0 in line above.

None of which will help you now.

Right now, you need luck and patience or the constitution to say goodbye to your data. Keep trying to bring your raid back up with redundancy. (I'd give up after 3 failed rebuilds) Then you need to re-sun smartctl self tests and replace anything that looks bad. Then, move to a double redundancy setup. You may need more drives.

There really isn't much other choice. There are data recovery services in town used by police and the like. Last I read it was $300 per HD. Not sure if they can rebuild linux software raid5 arrays. But maybe.

I've moved away from Linux RAID because I don't have the time for this stuff anymore. I got a Drobo 5N was able to move my discs in one at a time after copying the data in. Drobo Apps exist for ssh, rsync and nfs. Not the fastest thing in the world. But fast enough to play 1080p h.264 videos with 5.1 AAC which is really the main work-load for this device. It's a toaster, I set it and forget it.

On 05/12/14 12:39, Emery Guevremont wrote:
Hello,

I haven't written to this list in a while, the situation I put myself in requires me to ask for some help from this mailing list.

The short story here is I have a 4 disk raid 5 array, where one of my drives died. My array became degraded, I shutdown my home server until I received a new replacement drive. Upon receiving my replacement drive, I used it to replaced the broken drive and booted into single user mode to resync my RAID5 array. At about the 12% mark, another returned a read error and mdadm marked that drive to failed, essentially leaving me with a spare, a failed drive and only 2 drive out 4 in the array. I know, my chances are bleek to recover from this, but I still believe there's hope. Forget about backups, only half of my important files are backed up. I was in the middle of building a bigger NAS to backup my home server.

Long story and what I've done.

/dev/md0 is assembled with 4 drives
/dev/sda3
/dev/sdb3
/dev/sdc3
/dev/sdd3

2 weeks ago, mdadm marked /dev/sda3 as failed. cat /proc/mdstat showed _UUU. smarctl also confirmed that the drive was dying. So I shutdown the server and until I received a replacement drive.

This week, I replaced the dying drive with my new drive. Booted into single user mode and did this:

mdadm --manage /dev/md0 --add /dev/sda3 a cat of /proc/mdstat confirmed the resyncing process. The last time I checked it was up to 11%. After a few minutes later, I noticed that the syncing stopped. An read error message on /dev/sdd3 (have a pic of it if interested) appear on the console. It appears that /dev/sdd3 might be going bad. A cat /proc/mdstat showed _U_U. Now I panic, and decide to leave everything as is and to go to bed.

The next day, I shutdown the server and reboot with a live usb distro (Ubuntu rescue remix). After booting into the live distro, a cat /proc/mdstat showed that my /dev/md0 was detected but all drives had an (S) next to it. i.e. /dev/sda3 (S)... Naturally I don't like the looks of this.

I ran ddrescue to copy /dev/sdd onto my new replacement disk (/dev/sda). Everything, worked, ddrescue got only one read error, but was eventually able to read the bad sector on a retry.

Tonight I plan to repeat this procedure with ddrescue and to clone /dev/sdb and /dev/sdc.

Now is where I need your help. How should I got about to try and rebuild my array? I will be using the cloned drives to do this. My goal is to simply assemble my raid array in degraded state with sdb3, sdc3 and sdd3, mount /dev/md0 and backup as many files as I can. From the various google searches, I was going to first try a:

mdadm --assemble --scan #which I expect to not work.
mdadm --assemble --force #still not quite sure about the syntax and if ordering is important.
I've also seen people using the --assume-clean option.

But what commands should I do. Since if I mess-up, starting over, by recloning my drives is a time consuming step.


_______________________________________________
mlug mailing list
[email protected]
https://listes.koumbit.net/cgi-bin/mailman/listinfo/mlug-listserv.mlug.ca


--
Jean-Luc Cooke
+1-613-263-2983

_______________________________________________
mlug mailing list
[email protected]
https://listes.koumbit.net/cgi-bin/mailman/listinfo/mlug-listserv.mlug.ca

Reply via email to