This was all times in my mind as I read your error, that it sound like a punctured stripe. And now it is confirmed :-) So it is no HW error, only a stripe which is damaged but earliest the PErc6i have an feature to repair it. For earlier controllers it is much difficult:
- the punctured stripe can occur in a written stripe or empty stripe - to repair it on a written stripe you need a tool to locate the file and overwrite it with a known good (sometimes the backup SW would tell you which file/s is/are damaged) - in a empty stripe you need to write on these block (in the past you can download the MHDD utility, but it was long time not ago as I used it last time) The consistency check can't fix it. Other way is, backup the data, delete the array, recreate it with initialize, restore. (that is was the support say, as all other options are to difficult) Observations how it CAN BE occur: (only my experience and under rar circumstances....) (but max. in one of 1000 HDD Issues) - if you try to rebuild a disk with a media error - if a predictive failure disk was removed without setting it offline.... Some Lines out of a Dell Document to address these error on earlier Perc's: If media errors resides in user space (allocated space) The first step to fix a punctured stripe is to do a full backup of the logical disk. This will show if there are any media errors in user space, i.e. one or several files will be reported as corrupt. Any file reported corrupt must be overwritten with a known good copy. DO NOT DELETE THE FILE since this will basically mean that the media errors are "moved" to free space. Clearing media errors in free space is possible but will require some downtime. If a copy of the file doesn't exist that data will be lost. To still be able to clear/overwrite the media errors you will have to create a dummy file with the same name, same size and use it to overwrite the corrupt file. Next step is to wait until Patrol Read have done at least one cycle/iteration, then check the Windows event log/PERC controller log. The punctured stripe have been fixed if sense key 3 11 00 doesn't show up anymore. It's not unusual that media errors are still being reported after replacing corrupt files but the number of affected LBA's should have at least been reduced. This tell us that any remaining media errors resides in free space. If media errors resides in free space (unallocated space) Media errors in free space can be cleared by using the MHDD program. It's a freeware DOS program that can be used to write to a specific LBA/specific disk on the PERC controller. It will require some downtime since the system will need to be booted on a DOS diskette. -----Original Message----- From: Bond Masuda [mailto:[email protected]] Sent: Monday, May 03, 2010 9:05 PM To: Fischer, Patrick Cc: linux-poweredge-Lists Subject: RE: how to get rid of bad blocks in a file on PERC 5/I? Thanks Patrick for your reply. I know my original message was long, so perhaps it was missed, but I did run a consistency check, at least twice. However, after each CC run, we tested a dd_rescue attempt on the file in question and still had unreadable blocks. I was expecting one of two things: 1) the consistency check reporting back all sorts of problems, or 2) the unreadable blocks would go away. Neither was the case and hence I decided to reach out. I had forgotten about the "action=exportlog", thanks for reminding me about that. This is what i found: 04/29/10 14:45:10: EVT#17279-04/29/10 14:45:10: 97=Puncturing bad block on PD 03(e0/s3) at 33334430 04/29/10 14:45:10: EVT#17280-04/29/10 14:45:10: 97=Puncturing bad block on PD 06(e0/s6) at 33334430 04/29/10 14:45:11: EVT#17282-04/29/10 14:45:11: 97=Puncturing bad block on PD 04(e0/s4) at 33334430 04/29/10 14:45:11: EVT#17283-04/29/10 14:45:11: 97=Puncturing bad block on PD 06(e0/s6) at 33334430 04/29/10 14:45:11: EVT#17284-04/29/10 14:45:11: 97=Puncturing bad block on PD 03(e0/s3) at 33334430 -Bond > -----Original Message----- > From: [email protected] [mailto:linux-poweredge- > [email protected]] On Behalf Of [email protected] > Sent: Monday, May 03, 2010 2:46 AM > To: [email protected]; [email protected] > Cc: [email protected] > Subject: RE: how to get rid of bad blocks in a file on PERC 5/I? > > Consistency Check: > Check consistency. A check consistency determines the integrity of a > virtual disk's redundant data. When necessary, this feature rebuilds > the redundant information. > Source: > http://support.dell.com/support/edocs/software/svradmin/6.2/en/OMSS/cnt > rls.htm#wp681476 > > the remapping of bad sectors should be run automatically if the sector > can't be written and the controller try to write or read from it. > > Please check all times the controller log if you got filesystem erros > like you described. > Check the log for Bad LBA's on the disk like searching the log file > with "bad" > Check the Count of the LBA's and check if it occurs on multiple disks > like a punctured stripe.... > > The log you can get per megacli or open manage: > > Server Administrator cli: > omconfig storage controller action=exportlog controller=0 > where controller 0 = id of the involved controller _______________________________________________ Linux-PowerEdge mailing list [email protected] https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
