Hi,
      I've got a 6x200GB RAID 5 array that I've kept up for some time. I've 
always had a bit of trouble with stability and I've suspected a cranky 
controller, or disk, or a combo that simply doesn't work together, but I 
managed to get it up and stable for approximately 12 months. Now I'm adding a 
disk to the array and this problem has come round to bite me again and I'm 
hoping someone here can confirm my logic. I have 3 controller cards, a Promise 
IDE, a Maxtor branded SiI 680 IDE, and a SiI 3112 SATA. Previously, I had my 
drives configured so that each was a single drive, not in a master/slave 
config, but this is getting to be too much in the way of cabling, and I really 
think with modern UDMA drives that this should be necessary. I changed the 
config to get rid of some of those PATA cables. Here's a basic list of the new 
drive/controller config:

SiI 680
/dev/hda  Seagate Barracuda 200GB
/dev/hdb  Seagate Barracuda 200GB
/dev/hdc
/dev/hdd
PDC 20269
/dev/hde  Western Digital Caviar 40GB (Boot device, not part of RAID5)
/dev/hdf   Western Digital Caviar 200GB
/dev/hdg   Western Digital Caviar 200GB
/dev/hdh   Western Digital Caviar 200GB
SiI 3112
/dev/sda  Seagate Barracuda 200GB
/dev/sdb  Seagate Barracuda 200GB

I do know that WD drives are cranky in that they have different jumper settings 
for single vs master, and my jumpers were/are set correctly. Immediately on 
adopting this configuration, the array would come up, but on resyncing, I would 
recieve this error:

hdg: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdg: dma_intr: error=0x40 { UncorrectableError }, LBAsect=248725, high=0, 
low=248725, sector=248639
ide: failed opcode was: unknown
end_request: I/O error, dev hdg, sector 248639
raid5 Disk failure on hdg1, disabling device. Operation continuing on 4 devices

Then the machine would freeze. I'm confident that hdg did not suddenly die, as 
I've gotten these messages before when I was previously having stability 
issues. I repeated the procedure and got the error again and again on hdg. In 
order to find the problematic component, I switched the cable connecting hdg 
and hdh to the SiI 680 controller, making them hdc and hdd. On trying to 
resync, I got the same error message, but at a different sector, and on hdc 
(which would be the same drive). I feel that this isolated the problem to one 
WD 200GB drive which seems to always error when in a master/slave config on 
either controller. In order to recover my data, I changed the configuration 
again so that the problematic drive was in a single configuration again, making 
sure to set the jumper accordingly. I am now half-way through rebuilding the 
array.

I would simply like someone to confirm my assumption that although this drive 
functions correctly in a single configuration, it has some sort of hardware 
problem and needs to be RMA'd. I don't believe anything else to be at fault as 
I swapped which controller the drive was on, still saw errors, and I also ran 
the drive that was slaved to it with another drive and that drive never caused 
any trouble. 

Thanks for any input, and feel free to ask for more info, or suggest testing,
TJ Harrell

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to