Re: 2 errors when scrubbing - but I don't know what they mean

Wang Shilong Sun, 01 Dec 2013 17:31:50 -0800

On 12/02/2013 04:45 AM, Sebastian Ochmann wrote:

Hello,
> However, if you find such superblocks checksum mismatch very often
> during scrub, it maybe
> there are something wrong with disk!
I'm sorry, but I don't think there's a problem with my disks because Iwas able to trigger the errors that increment the "gen" error counterduring scrub on a completely different machine and drive today. Ibasically performed some I/O operations on a drive and scrubbed at thesame time over and over again until I actually saw "super" errorsduring scrub. But the error is reeally hard to trigger. It seems to melike a race condition somewhere.
So I went a step further and tried to create a repro for this. Itseems like I can trigger the errors now once every few minutes withthe method described below, but sometimes it really takes a long timeuntil the error pops up, so be patient when trying this...
For the repro:
I'm using a btrfs image in RAM for this for two reasons: I can scrubquickly over and over again and I can rule our hard drive errors. Mymachine has 32 GB of RAM, so that comes in handy here - if you trythis on a physical drive, make sure to adjust some parameters, ifnecessary.
Create a tmpfs and a testing image, format as btrfs:

$ mkdir btrfstest
$ cd btrfstest/
$ mkdir tmp
$ mount -t tmpfs -o size=20G none tmp
$ dd if=/dev/zero of=tmp/vol bs=1G count=19
$ mkfs.btrfs tmp/vol
$ mkdir mnt
$ mount -o commit=1 tmp/vol mnt
Note the "commit=1" mount option. It's not strictly necessary, but Ihave the feeling it helps with triggering the problem...
So now we have a 19 GB btrfs filesystem in RAM, mounted in "mnt". WhatI did for performing some artificial I/O operations is to rm and cp alinux source tree over and over again. Suppose you have an unpackedlinux source tree available in the "/somewhere/linux" directory (andyou're using bash). We'll spawn some loops that keep the filesystem busy:
$ while true; do rm -fr mnt/a; sleep 1.0; cp -R /somewhere/linuxmnt/a; sleep 1.0; done$ while true; do rm -fr mnt/b; sleep 1.1; cp -R /somewhere/linuxmnt/b; sleep 1.1; done$ while true; do rm -fr mnt/c; sleep 1.2; cp -R /somewhere/linuxmnt/c; sleep 1.2; done
Now that the filesystem is busy, we'll also scrub it repeatedly(without backgrounding, -B):
$ while true; do btrfs scrub start -B mnt; sleep 0.5; done
On my machine and in RAM, each scrub takes 0-1 second and the "totalbytes scrubbed" should fluctuate (seems to be especially true withcommit=1, but not sure). Get a beverage of your choice and wait.
(about 10 minutes later)

When I was writing this repro it took about 10 minutes until scrub said:

  total bytes scrubbed: 1.20GB with 2 errors
  error details: super=2
  corrected errors: 0, uncorrectable errors: 0, unverified errors: 0

and in dmesg:
[15282.155170] btrfs: bdev /dev/loop0 errs: wr 0, rd 0, flush 0,corrupt 0, gen 1[15282.155176] btrfs: bdev /dev/loop0 errs: wr 0, rd 0, flush 0,corrupt 0, gen 2
After that, scrub is happy again and will continue normally until thesame errors happen again after a few hundred scrubs or so.
So all in all, the error can be triggered using normal I/O operationsand scrubbing at the right moments, it seems. Even with a btrfs imagein RAM, so no hard drive error is possible.
Hope anyone can reproduce this and maybe debug it.

Let me have a look at this.

Thanks,
Wang


Best regards
Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2 errors when scrubbing - but I don't know what they mean

Reply via email to