Re: scrub implies failing drive - smartctl blissfully unaware

Chris Murphy Fri, 21 Nov 2014 09:45:57 -0800

On Fri, Nov 21, 2014 at 5:55 AM, Ian Armstrong <bt...@iarmst.co.uk> wrote:


> In my situation what I've found is that if I scrub & let it fix the
> errors then a second pass immediately after will show no errors. If I
> then leave it a few days & try again there will be errors, even in
> old files which have not been accessed for months.

What are the devices? And if they're SSDs are they powered off for
these few days? I take it the scrub error type is corruption?

You can use badblocks to write a known pattern to the drive. Then
power off and leave it for a few days. Then read the drive, matching
against the pattern, and see if there are any discrepancies. Doing
this outside the code path of Btrfs would fairly conclusively indicate
whether it's hardware or software induced.

Assuming you have another copy of all of these files :-) you could
just sha256sum the two copies to see if they have in fact changed. If
they have, well then you've got some silent data corruption somewhere
somehow. But if they always match, then that suggests a bug. I don't
see how you can get bogus corruption messages, and for it to not be a
bug. When you do these scrubs that come up clean, and then later come
up with corruptions, have you done any software updates?


> My next step is to disable autodefrag & see if the problem persists.
> (I'm not suggesting a problem with autodefrag, I just want to remove it
> from the equation & ensure that outside of normal file access, data
> isn't being rewritten between scrubs)

I wouldn't expect autodefrag to touch old files not accessed for
months. Doesn't it only affect actively used files?


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: scrub implies failing drive - smartctl blissfully unaware

Reply via email to