Re: How does btrfs handle bad blocks in raid1?

George Mitchell Tue, 14 Jan 2014 12:30:15 -0800

On 01/14/2014 11:13 AM, Chris Murphy wrote:

On Jan 9, 2014, at 6:31 PM, George Mitchell <geo...@chinilu.com> wrote:

Jim, my point was that IF the drive does not successfully resolve the bad block 
issue and btrfs takes a write failure every time it attempts to overwrite the 
bad data, it is not going to remap that data, but rather it is going to fail 
the drive.

If the drive doesn't resolve a bad block on write, then the drive is toast. 
That's how md handles it. That's even how manufacturers handle it. The point at 
which write failures occur mean there are no reserve sectors left, or the head 
itself is having problems writing data to even good sectors. Either way, the 
drive isn't reliable for rw purposes and coming up with a bunch of code to fix 
bad drives isn't worth development time in my opinion. Such a drive is vaguely 
interesting for test purposes however, because even though the drive is toast, 
we'd like the system to remain stable with it connected first and foremost. And 
maybe we'd want it as a source during rebuild/replacement.

  In other words, if the drive has a bad sector which it has not done anything 
about at the drive level, btrfs will not remap the sector.  It will, rather, 
fail the drive. Is that not correct?

I've skimmed for this in the code, but haven't found it, so I'm not sure what 
the handling is. It's probably easier to take a drive I don't care about, and 
use hdparm to cause a sector to be flagged as bad, and see how Btrfs handles 
it. (The hdparm command should be clearable, but I'd rather not screw up a 
drive I like.)

Chris Murphy

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris, Please don't misunderstand me. I am not advocating that btrfsor any other filesystem should be dealing with bad blocks. I believevery strongly that if the drive firmware can't deal with thattransparently the drive is, indeed, toast, and should be tossed. Andthe key to monitoring hard drive health, in my opinion, is SMART andwhat we are lacking at this point is a SMART capability to providevisual notifications to the user when any hard drive starts to seriouslydegrade or suddenly fails. This would ideally be mediated by journaldaemon which desperately needs to be enhanced to provide visual andideally audible pop up warnings to the user in such cases. It would benice to have those notifications from btrfs as well, also mediated byjournal daemon, but this is really a SMART specialty and SMART should beour first line defense. Where we need btrfs to move is toward automatedresiliency, automatically dropping the bad drive(s) and automaticallyfollowing up with a rebalance and return to sanity. If SMART werecapable of launching pop up warnings, btrfs would not have to worry somuch about arrays going simplex undetected. And it should really bethe user's responsibility to be running SMART and providing sufficientnumber of drives AND sufficient additional free space to accommodatepotential drive failure and still retain desired level of redundancyextra drives in their RAID arrays. That is where I stand on this.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: How does btrfs handle bad blocks in raid1?

Reply via email to