I apologize for the bad line wrapping on the last post...will be
setting up mutt soon.

This is the final result for the offline scrub:
Doing offline scrub [O] [681/683]
Scrub result:
Tree bytes scrubbed: 5234491392
Tree extents scrubbed: 638975
Data bytes scrubbed: 4353723572224
Data extents scrubbed: 374300
Data bytes without csum: 533200896
Read error: 0
Verify error: 0
Csum error: 175

The offline scrub apparently corrected some metadata extents while
scanning /dev/sdn


I also ran the online scrub directly on the /dev/sdn, "0 errors":

$ btrfs scrub status /dev/sdn
scrub status for 88406942-e3e1-42c6-ad71-e23bb315caa7
        scrub started at Tue Oct 24 06:55:12 2017 and finished after 01:52:44
        total bytes scrubbed: 677.35GiB with 0 errors

The csum mismatches are still missed by the online scrub when choosing
a single <device>. Now I am doing offline scrub on the other devices
to see if they are clean.

$ lsblk -o +SERIAL
NAME      MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT SERIAL
sdh         8:112  0  1.8T  0 disk             WD-WMAZA370XXXX
sdi         8:128  0  1.8T  0 disk             WD-WCAZA569XXXX
sdn         8:208  0  1.8T  0 disk             WD-WCAZA580XXXX

$ btrfs scrub start --offline --progress /dev/sdh
ERROR: data at bytenr 5365456896 ...
ERROR: extent 5341712384 ...
...

One thing to note is that a /dev/sdh is also having csum errors
detected despite it having never been mentioned dmesg. I understand
that you may have the ability to run two offline checks at once but
the error message I get is slightly misleading.

$ btrfs scrub start --offline --progress /dev/sdi
ERROR: cannot open device '/dev/sdn': Device or resource busy
ERROR: cannot open file system

I get an error about sdn when the device I am trying to scan is sdi,
and the device that is currently being scanned is sdh.

On Tue, Oct 24, 2017 at 2:00 AM, Zak Kohler <y...@y2kbugger.com> wrote:
> Yes, it is finding much more than just one error.
>
> From dmesg
> [89520.441354] BTRFS warning (device sdn): csum failed ino 4708 off
> 27529216 csum 2615801759 expected csum 874979996
>
> $ sudo btrfs scrub start --offline --progress /dev/sdn
> ERROR: data at bytenr 68431499264 mirror 1 csum mismatch, have
> 0x5aa0d40f expect 0xd4a15873
> ERROR: extent 68431474688 len 14467072 CORRUPTED, all mirror(s)
> corrupted, can't be repaired
> ERROR: data at bytenr 83646357504 mirror 1 csum mismatch, have
> 0xfc0baabe expect 0x7f9cb681
> ERROR: extent 83519741952 len 134217728 CORRUPTED, all mirror(s)
> corrupted, can't be repaired
> ERROR: data at bytenr 121936633856 mirror 1 csum mismatch, have
> 0x507016a5 expect 0x50609afe
> ERROR: extent 121858334720 len 134217728 CORRUPTED, all mirror(s)
> corrupted, can't be repaired
> ERROR: data at bytenr 144872591360 mirror 1 csum mismatch, have
> 0x33964d73 expect 0xf9937032
> ERROR: extent 144822386688 len 61231104 CORRUPTED, all mirror(s)
> corrupted, can't be repaired
> ERROR: data at bytenr 167961075712 mirror 1 csum mismatch, have
> 0xf43bd0e3 expect 0x5be589bb
> ERROR: extent 167950999552 len 27537408 CORRUPTED, all mirror(s)
> corrupted, can't be repaired
> ERROR: data at bytenr 175643619328 mirror 1 csum mismatch, have
> 0x1e168ca1 expect 0xd413b1e0
> ERROR: data at bytenr 175643754496 mirror 1 csum mismatch, have
> 0x6cfdc8ae expect 0xa6f8f5ef
> ERROR: extent 175640539136 len 6381568 CORRUPTED, all mirror(s)
> corrupted, can't be repaired
> ERROR: data at bytenr 183316750336 mirror 1 csum mismatch, have
> 0x145bdf76 expect 0x7390565e
> .....
> and the list goes on.
>
>
> Questions:
> 1. Using "find /mnt -inum 4708" I can link the dmesg to a specific
> file. Is there a
> way link the the --offline ERRORs above to the inode?
>
> 2. How could do "btrfs device stats /mnt" and normal full scrub fail
> to detect the csum errors?
>
> 3. Do these errors appear to be hardware failure (despite pristine
> SMART), user error on
> volume creation/mounting, or an actual btrfs issue? I feel that the
> need for question #1
> indicates a problem with btrfs regardless of whether there is a real
> hardware failure or not.
>
>
> Next I will try an online scrub of only the sdn device, as before I
> was running the full filesystem scrub.
>
> On Tue, Oct 24, 2017 at 12:52 AM, Lakshmipathi.G
> <lakshmipath...@gmail.com> wrote:
>>> Does anyone know why scrub did not catch these errors that show up in dmesg?
>>
>> Can you try offline scrub from this repo
>> https://github.com/gujx2017/btrfs-progs/tree/offline_scrub and see
>> whether it
>> detects the issue?  "btrfs scrub start --offline <dev>"
>>
>>
>> ----
>> Cheers,
>> Lakshmipathi.G
>> http://www.giis.co.in http://www.webminal.org
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to