On 01/14/2014 11:13 AM, Chris Murphy wrote:
On Jan 9, 2014, at 6:31 PM, George Mitchell <geo...@chinilu.com> wrote:
Jim, my point was that IF the drive does not successfully resolve the bad block
issue and btrfs takes a write failure every time it attempts to overwrite the
bad data, it is not going to remap that data, but rather it is going to fail
the drive.
If the drive doesn't resolve a bad block on write, then the drive is toast.
That's how md handles it. That's even how manufacturers handle it. The point at
which write failures occur mean there are no reserve sectors left, or the head
itself is having problems writing data to even good sectors. Either way, the
drive isn't reliable for rw purposes and coming up with a bunch of code to fix
bad drives isn't worth development time in my opinion. Such a drive is vaguely
interesting for test purposes however, because even though the drive is toast,
we'd like the system to remain stable with it connected first and foremost. And
maybe we'd want it as a source during rebuild/replacement.
In other words, if the drive has a bad sector which it has not done anything
about at the drive level, btrfs will not remap the sector. It will, rather,
fail the drive. Is that not correct?
I've skimmed for this in the code, but haven't found it, so I'm not sure what
the handling is. It's probably easier to take a drive I don't care about, and
use hdparm to cause a sector to be flagged as bad, and see how Btrfs handles
it. (The hdparm command should be clearable, but I'd rather not screw up a
drive I like.)
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris, Please don't misunderstand me. I am not advocating that btrfs
or any other filesystem should be dealing with bad blocks. I believe
very strongly that if the drive firmware can't deal with that
transparently the drive is, indeed, toast, and should be tossed. And
the key to monitoring hard drive health, in my opinion, is SMART and
what we are lacking at this point is a SMART capability to provide
visual notifications to the user when any hard drive starts to seriously
degrade or suddenly fails. This would ideally be mediated by journal
daemon which desperately needs to be enhanced to provide visual and
ideally audible pop up warnings to the user in such cases. It would be
nice to have those notifications from btrfs as well, also mediated by
journal daemon, but this is really a SMART specialty and SMART should be
our first line defense. Where we need btrfs to move is toward automated
resiliency, automatically dropping the bad drive(s) and automatically
following up with a rebalance and return to sanity. If SMART were
capable of launching pop up warnings, btrfs would not have to worry so
much about arrays going simplex undetected. And it should really be
the user's responsibility to be running SMART and providing sufficient
number of drives AND sufficient additional free space to accommodate
potential drive failure and still retain desired level of redundancy
extra drives in their RAID arrays. That is where I stand on this.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html