Kit, If you have another (non-RAID) SCSI system, you could take the faulty drive there to modify the mode pages to turn on AWRE and ARRE with either sgmode (scsirastools.sf.net) or sginfo (sg3_utils).
Otherwise, you are dependent on the tools that are provided for the PowerEdge RAID controller. Andy -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Douglas Gilbert Sent: Tuesday, February 01, 2005 7:44 AM To: Kit Gerrits Cc: [email protected] Subject: Re: Disk Errors Kit Gerrits wrote: > I have found 08:05 to correspond to /dev/sda5, mounted as /usr(Thanks for > the pointer!). > > Sda is the single-drive volume > (non-RAID, as it is only for the O/S, > which needs to be speedy and can be pulled from tape easily). > > This explains several things: > A/ Why a single error can take an entire volume offline B/ Why the error is > not logged > If it only took the partition offline, > it would still have been logged, > as / is mounted from sda3 > > And leaves one question: > What caused the error? > > There are no GROWN defects on the drive in this volume Kit, A block/sector is added to the grown defect list after it has been reassigned. Reaasignment occurs automatically for recoverable (medium) errors if the AWRE and/or ARRE bits are set (those bits are in the read write error recovery mode page). So there are two situations in which damaged blocks remain accessible: 1) unrecoverable medium errors 2) recoverable medium errors when AWRE and/or ARRE are clear Case 2) can be ignored ** or could be handled by setting ARRE and then reading the whole disk (e.g. with dd). Both cases can be handled with the REASSIGN BLOCKS SCSI command once the defective logical block address (lba) or addresses have been identified. Using the sg3_utils package various things can be done: - "sginfo -e /dev/sda" will show the AWRE and ARRE settings. Changing them with sginfo is a bit ugly - "sginfo -G /dev/sda" will show the grown defect list in "index" format (up to 3 other formats may be available) - "sg_dd if=/dev/sg0 of=/dev/null bs=512" will read the whole disk or fail at the first unrecoverable (medium) error. If a medium error is detected the "info" field is the lba of the defect. *** - "sg_reassign -a <lba> /dev/sda" will reassign the <lba> block. If this succeeds <lba> should appear in the grown defect list ("sginfo -G -Flogical /dev/sda"). When a logical block with unrecoverable errors is reassigned then the new contents are vendor specific. I'm not sure how file systems react to this. ** recoverable errors can be ignored. Assuming these recoverable errors occur on read operations then the "read error counter" log page's recovered error counter (one of them depending on the duration of the recovery process) will be incremented *** due to error processing, it is still better to use /dev/sg0 rather than than /dev/sda with the sg_dd utility. Recent changes (lk 2.6.11-rc2-bk8) make the following work: "sg_dd if=/dev/sda blk_sgio=1 of=/dev/null bs=512" in the presence of errors Doug Gilbert > --------------- > Reference logs: > --------------- > > Executing: disk show defects (ID=0) > Number of PRIMARY defects on drive: 1912 Number of GROWN defects on drive: 0 > > Executing: container list > Num Total Oth Chunk Scsi Partition > Label Type Size Ctr Size Usage B:ID:L Offset:Size > ----- ------ ------ --- ------ ------- ------ ------------- > 0 Volume 8.47GB Open 0:00:0 64.0KB:8.47GB > /dev/sda NT > 1 RAID-5 16.9GB 32KB Open 0:01:0 64.0KB:8.47GB > /dev/sdb DATA 0:02:0 64.0KB:8.47GB > ?:??:? - Missing - Mount points it > to: > # /dev/sda5 5.3G 1.5G 3.6G 30% /usr > > > >>-----Oorspronkelijk bericht----- >>Van: Salyzyn, Mark [mailto:[EMAIL PROTECTED] >>Verzonden: dinsdag 1 februari 2005 4:15 >>Aan: Kit Gerrits >>Onderwerp: RE: Disk errors >> >>The controller does not appear to be busted; you have a Volume and a >>RAID-5. Are you missing an Array? >> >>A two drive failure on a RAID-5 gives you an offline array. >> >>A single drive failure in a Volume gives you an offline array. >> >>You need to find who is 08:05, look through /dev for the major/minor >>number and relate it to the 'device'. Look through /proc/scsi/scsi and >>/var/messages to help correlate it. >> >>Sincerely -- Mark Salyzyn >> > > > - > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html

