Re: Raid5 assemble after dual sata port failure

Chris Eddington Fri, 16 Nov 2007 22:34:03 -0800

Yes, this is exactly the kind of symptoms I've experienced. I waslosing a drive here and there every couple of months (mostly the lasttwo drives sdc and sdd) which I though were cable problems (shut down,re-plug the cables and restart and it would always work, withadd/rebuild the 4th disk). But now my guess is the motherboard chipsetis overheating (or maybe the drives). I have an MSI K9N platinumAMD/Nividia chipset that has 4 raid ports + 2 raid ports from a separatechip. The mb chipset comes with a wimpy heatsink on it and it is veryhot to the touch. I had been planning to replace it but never gotaround to it.

I've been out of town this week so I had someone image all three disks.He used ghost disk image application. He said the third disk reportedmedia problems, and about 5% of the data was not fixable (sectorerrors). Using these three copied drives, the array comes up andxfs_repair still reports a bunch of inode repairs as before, but it is abit different, maybe even a reduction in losses. But most important isthe hpa_sector errors no longer occur.


Key questions:

- I assume ddrescue will do a much better job of correcting errors whenimaging a disk? My colleague used ghost which is just a copy tool. Idon't understand the capabilities of ddrescue on raid partitions that well.- fdisk -l reports that all the drives are exactly the same size withexactly the same # sectors shown below. I don't quite follow thehpa_resize issue, but it appears the drives don't have hidden HPAsectors - I guess? Note that sdc is the original drive, where sda, sdb,and sdd are the imaged drives.

So what do you recommend to do first? Should I try xfs_repair on theghost copy, or just re-copy myself using ddrescue? Are there specialsettings to ddrescue I should consider to verify/correct potential HPAchanges?


Thks,
Chris

Disk /dev/sda: 500.1 GB, 500107862016 bytes

/dev/sda1 1 60801 488384001 fd Linux raidautodetect

Disk /dev/sdb: 500.1 GB, 500107862016 bytes

/dev/sdb1 1 60801 488384001 fd Linux raidautodetect

Disk /dev/sdc: 500.1 GB, 500107862016 bytes

/dev/sdc1 1 60801 488384001 fd Linux raidautodetect

Disk /dev/sdd: 500.1 GB, 500107862016 bytes

/dev/sdd1 1 60801 488384001 fd Linux raidautodetect


Bill Davidsen wrote:

David Greaves wrote:
Chris Eddington wrote:
Yes, there is some kind of media error message in dmesg, below.  It is
not random, it happens at exactly the same moments in eachxfs_repair -n
run.
Nov 11 09:48:25 altair kernel: [37043.300691]          res
51/40:00:01:00:00/00:00:00:00:00/e1 Emask 0x9 (media error)
Nov 11 09:48:25 altair kernel: [37043.304326] ata4.00:ata_hpa_resize 1:
sectors = 976773168, hpa_sectors = 976773168
Nov 11 09:48:25 altair kernel: [37043.307672] ata4.00:ata_hpa_resize 1:
sectors = 976773168, hpa_sectors = 976773168
I'm not sure what an ata_hpa_resize error is...
HPA = Hardware Protected Area.
By any chance is this disk partitioned such that the partition sizeincludes the HPA? If it does, this sounds at least familiar, thismailing list post may get you started:http://osdir.com/ml/linux.ataraid/2005-09/msg00002.html
In any case, run "fdisk -l" and look at the claimed total disk sizeand the end point of the last partition. The HPA is not included inthe "disk size" so nothing should be trying to do so.
It probably explains the problems you've been having with the raidnot 'just
recovering' though.

I saw this:
http://www.linuxquestions.org/questions/linux-kernel-70/sata-issues-568894/
May be the same thing. Let us know what fdisk reports.
What does smartctl say about your drive?
IMO the spare drive is no longer useful for data recovery - you maywant to use
ddrescue to try and copy this drive to the spare drive.

David
PS Don't get the ddrescue parameters the wrong way round if you gothat route...
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Raid5 assemble after dual sata port failure

Reply via email to