On 2018年04月14日 15:31, Timo Nentwig wrote:
> Hi!
> 
> btrfs remounted itself ro during operation (don't have the dmesg) and
> fails to mount after reboot.

The common mount option please, and if possible, the hardware model of
sda2 please.

And the work load when the RO happens is also helpful.
(Well, the dmesg of RO happens would be the best though)
> 
> Any advice?
> 
> 
> 4.15.15-1-ARCH #1 SMP PREEMPT Sat Mar 31 23:59:25 UTC 2018 x86_64 GNU/Linux
> btrfs-progs v4.16
> Label: '830'  uuid: 22e778f7-2499-4379-99d2-cdd399d1cc6e
>         Total devices 1 FS bytes used 34.39GiB
>         devid    1 size 59.49GiB used 58.98GiB path /dev/sda2
> 
> [  867.041397] BTRFS info (device sda2): disk space caching is enabled
> [  867.185357] BTRFS info (device sda2): bdev /dev/sda2 errs: wr 0, rd
> 158, flush 0, corrupt 0, gen 0
> [  868.423427] BTRFS error (device sda2): parent transid verify failed
> on 166030671872 wanted 1702074 found 1705980

Another metadata corruption.

> [  868.425400] BTRFS error (device sda2): failed to read block groups: -5

Extent tree is corrupted.
If your objective is just to salvage data, at least this means your data
should be mostly safe (if there is no other metadata corruption).

> 
> # sudo btrfs check /dev/sda2
> parent transid verify failed on 166030671872 wanted 1702074 found 1705980
> parent transid verify failed on 166030671872 wanted 1702074 found 1705980
> Ignoring transid failure
> leaf parent key incorrect 166030671872
> ERROR: cannot open file system

I could enhance btrfs check to continue checking, but it may need some
time to get rid of all btrfs_block_group_cache usage in that case.

> # sudo btrfs-debug-tree -b 166030671872 /dev/sda2

Btrfs works because when using -b option, (current) btrfs check will not
try to read block groups any more, so it can continue.

> btrfs-progs v4.16
> parent transid verify failed on 166030671872 wanted 1702074 found 1705980
> parent transid verify failed on 166030671872 wanted 1702074 found 1705980
> Ignoring transid failure
> leaf parent key incorrect 166030671872
> leaf 166030671872 items 60 free space 95 generation 1705980 owner TREE_LOG

This output is pretty interesting, and pretty helpful.

The tree block causing the problem belongs to TREE_LOG, while according
to kernel dmesg, it should belong to extent tree.

So that's showing tree log code is more or less related to this situation.

> leaf 166030671872 flags 0x1(WRITTEN) backref revision 1
> fs uuid 22e778f7-2499-4379-99d2-cdd399d1cc6e
> chunk uuid bee8ad15-e128-45f1-a3d7-e2fda17806ce
>     item 0 key (46475 DIR_ITEM 2335543231) itemoff 3955 itemsize 40
>         location key (1973554 INODE_ITEM 0) type FILE
>         transid 1705438 data_len 0 name_len 10
>         name: 002443.sst
> 
>

If your primary objective is to salvage data, please apply these patches
upon v4.16.
https://patchwork.kernel.org/patch/10334955/
https://patchwork.kernel.org/patch/10334957/

And then try btrfs restore to salvage as much as data as possible.
For the best case (no other tree block corruption), it should salvage
all your data.

Despite above salvage method, please also considering provide the
following data, as your case is pretty special and may help us to catch
a long hidden bug.

1) Extent tree dump
   Need above 2 patches applied first.

   # btrfs inspect dump-tree -t extent /dev/sda2 &> \
     /tmp/extent_tree_dump
   If above dump is too large, "grep -C20 166030671872" of the output is
   also good enough.

   This would help us to inspect the backref of that tree to see if we
   could find anything wrong.

2) super block dump
   # btrfs inspect dump-super -f /dev/sda2

3) Extra hardware info about your sda
   Things like SMART and hardware model would also help here.

4) The mount option of /dev/sda2

Thanks,
Qu

> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to