Re: [PATCH 000 of 5] md: Introduction

PFC Wed, 18 Jan 2006 23:22:28 -0800

While we're at it, here's a little issue I had with RAID5 ; not reallythe fault of md, but you might want to know...

I have a 5x250GB RAID5 array for home storage (digital photo, my losslessripped cds, etc). 1 IDE Drive ave 4 SATA Drives.Now, turns out one of the SATA drives is a Maxtor 6V250F0, and these haveproblems ; it died, then was RMA'd, then died again. Finally, it turnedout this drive series is incompatible with nvidia sata chipsets. A thirddrive seems to work, setting the jumper to SATA 150.

        Back to the point.

Failure mode of these drives is an IDE command timeout. This takes a longtime ! So, when the drive has failed, each command to it takes forever. mdwill eventually reject said drive, but it takes hours ; and meanwhile, thecomputer is unusable and data is offline...

In this case, the really tempting solution is to hit the windows key (er,the hard reset button) ; but doing this, makes the array dirty anddegraded, and it won't mount, and all data is seemingly lost. (well,recoverable with a bit of hacking /* goto error; */, but that's not veryclean...)

This isn't really a md issue, but it's really annoying only when usingRAID, because it makes a normal process (kicking a dead drive out) so slowit's almost non-functional. Is there a way to modify the timeout inquestion ?

Note that, re-reading the log below, it writes "Disk failure on sdd1,disabling device. Operation continuing on 4 devices", but errors continueto come, and the array is still unreachable (ie. cat /proc/mdstat hangs,etc). Hmm...


        Thanks for the time.


Jan  8 21:38:41 apollo13 ReiserFS: md2: checking transaction log (md2)

Jan 8 21:39:11 apollo13 ata4: command 0xca timeout, stat 0xd0 host_stat0x21Jan 8 21:39:11 apollo13 ata4: translated ATA stat/err 0xca/00 to SCSISK/ASC/ASCQ 0xb/47/00

Jan  8 21:39:11 apollo13 ata4: status=0xca { Busy }
Jan  8 21:39:11 apollo13 sd 3:0:0:0: SCSI error: return code = 0x8000002
Jan  8 21:39:11 apollo13 sdd: Current: sense key=0xb
Jan  8 21:39:11 apollo13 ASC=0x47 ASCQ=0x0
Jan  8 21:39:11 apollo13 Info fld=0x3f
Jan  8 21:39:11 apollo13 end_request: I/O error, dev sdd, sector 63

Jan 8 21:39:11 apollo13 raid5: Disk failure on sdd1, disabling device.Operation continuing on 4 devices

Jan  8 21:39:11 apollo13 ATA: abnormal status 0xD0 on port 0x977
Jan  8 21:39:11 apollo13 ATA: abnormal status 0xD0 on port 0x977
Jan  8 21:39:11 apollo13 ATA: abnormal status 0xD0 on port 0x977

Jan 8 21:39:41 apollo13 ata4: command 0xca timeout, stat 0xd0 host_stat0x21Jan 8 21:39:41 apollo13 ata4: translated ATA stat/err 0xca/00 to SCSISK/ASC/ASCQ 0xb/47/00

Jan  8 21:39:41 apollo13 ata4: status=0xca { Busy }
Jan  8 21:39:41 apollo13 sd 3:0:0:0: SCSI error: return code = 0x8000002
Jan  8 21:39:41 apollo13 sdd: Current: sense key=0xb
Jan  8 21:39:41 apollo13 ASC=0x47 ASCQ=0x0
Jan  8 21:39:41 apollo13 Info fld=0x9840097
Jan  8 21:39:41 apollo13 end_request: I/O error, dev sdd, sector 159645847
Jan  8 21:39:41 apollo13 ATA: abnormal status 0xD0 on port 0x977
Jan  8 21:39:41 apollo13 ATA: abnormal status 0xD0 on port 0x977
Jan  8 21:39:41 apollo13 ATA: abnormal status 0xD0 on port 0x977

Jan 8 21:40:01 apollo13 cron[17973]: (root) CMD (test -x/usr/sbin/run-crons && /usr/sbin/run-crons )Jan 8 21:40:11 apollo13 ata4: command 0x35 timeout, stat 0xd0 host_stat0x21Jan 8 21:40:11 apollo13 ata4: translated ATA stat/err 0x35/00 to SCSISK/ASC/ASCQ 0x4/00/00Jan 8 21:40:11 apollo13 ata4: status=0x35 { DeviceFault SeekCompleteCorrectedError Error }

Jan  8 21:40:11 apollo13 sd 3:0:0:0: SCSI error: return code = 0x8000002
Jan  8 21:40:11 apollo13 sdd: Current: sense key=0x4
Jan  8 21:40:11 apollo13 ASC=0x0 ASCQ=0x0
Jan  8 21:40:11 apollo13 end_request: I/O error, dev sdd, sector 465232831
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 000 of 5] md: Introduction

Reply via email to