Re: [Linux-PowerEdge] 2 predicted failure disks and RAID5

Stephen Dowdy Tue, 14 Nov 2017 11:18:06 -0800

On 11/14/2017 11:52 AM, Grzegorz Bakalarski wrote:
> Thanks for valuable input.
> Regarding punctured block:  from fwtermlog I got several (not much) lines of 
> type:
> 
> 11/13/17  3:24:45: EVT#08603-11/13/17  3:24:45:  97=Puncturing bad block on 
> PD 02(e0x20/s2) at 9ecd
that's bad.  You have a punctured stripe.


> T35:     maintainPdFailHistory=0 disablePuncturing=0 
> zeroBasedEnclEnumeration=1 disableBootCLI=1
This is and informational line indicating that the controller doesn't have the 
disablePuncturing config option set.

> All the same PD, the same bad block (different time)
> 
> Is my raid useless?

No, it's good enough to recover what data you can before you rebuild it.  
However, you can't trust the data that uses the bad block.   You'll get a read 
error from any object that maps to it.

Here's a good doc Dell put out:

https://www.dell.com/support/article/us/en/4/438291#2
   "...If the data within a punctured stripe is accessed errors will continue 
to be reported against the affected badLBAs with no possible correction 
available. Eventually (this could be minutes, days, weeks, months, etc.), the 
Bad Block Management (BBM) Table will fill up causing one or more drives to 
become flagged as predictive failure.,,,:

> BTW: why do think raid level migration to raid-6 with 2 additional disk would 
> be better than with one disk. I would keep VD size the same.

I'm not talking about a migration, i'm talking a complete WIPE of what you 
have, and a recreation from scratch.  At this point, you can recover what you 
can to a staging location, rebuild, then restore.
Keep track of data with I/O errors, because it's going to have a corrupted 
block at the punctured block address.  this could (if you're lucky), be in 
unallocated space.  could also be in filesystem structures and lead to 
widescale corruption of the filesystem.

I would mount it all READONLY and do a file-level dump (not a 'dd' or anything 
like that, which would migrate corrupted filesystem structures).  (i typically 
'rsync' data to another machine.).  You don't want any backup tool that does 
infinite retries, as it'll likely result in another disk failure. (from the 
above)  

> Anyway will migration too raid-6 fail with this  "awful Puncturing)???

RAID-6 is going to lessen the likelihood of a puncture, with 2 parity drives.  
While you're rebuilding a RAID5, any unrecoverable bad block event on any of 
the "good" drives during the rebuild will result in a puncture, with RAID6, you 
still have parity to cope with an uncorrectable error.

The above is especially true of some of the less reliable seagate drives from 
past years.  You can't count on them not throwing UCEs during a rebuild (or 
before you get the replacement drive installed), thereby puncturing the RAID.  
:-(

--stephen
-- 
Stephen Dowdy  -  Systems Administrator  -  NCAR/RAL
303.497.2869   -  [email protected]        -  http://www.ral.ucar.edu/~sdowdy/

_______________________________________________
Linux-PowerEdge mailing list
[email protected]
https://lists.us.dell.com/mailman/listinfo/linux-poweredge

Re: [Linux-PowerEdge] 2 predicted failure disks and RAID5

Reply via email to