Re: Strange intermittant errors + RAID doesn't fail the disk.

Mattias Wadenstein Fri, 07 Jul 2006 01:58:35 -0700

On Fri, 7 Jul 2006, Neil Brown wrote:

On Thursday July 6, [EMAIL PROTECTED] wrote:

I suggest you find a SATA related mailing list to post this to (Look
in the MAINTAINERS file maybe) or post it to linux-kernel.


linux-ide couldn't help much, aside from recommending a bleeding-edge
patchset which should fix a lot of things SATA:
http://home-tj.org/files/libata-tj-stable/

What fixed the error, though, was exchanging one of the cables. (Just
my luck, it was new and supposedly quality, ... oh well)

I'm still interested in why the md code didn't fail the disk. While it
was 'up' any access to the array would hang for a long time,
ultimately fail and corrupt the fs to boot. When I failed the disk
manually everything was fine (if degraded) again.


md is very dependant on the driver doing the right thing.  It doesn't
do any timeouts or anything like that - it assumes the driver will.
md simply trusts the return status from the drive, and fails a drive
if and only if a write to the drive is reported as failing (if a read
fails, md trys to over-write with good data first).

Hmm.. Perhaps a bit of extra logic there might be good? If you try tore-write the failing bit with good data, try to read the recently writtendata back (perhaps after a bit of wait). If that still fails, then failthe disk.

If it can't remember recently written data, it is clearly unsuitable for arunning system. But the occasional block going bad (and getting remappedat a write) wouldn't trigger it.


/Mattias Wadenstein
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Strange intermittant errors + RAID doesn't fail the disk.

Reply via email to