On Sat, Mar 20, 2021 at 5:15 AM Dave T <davestechs...@gmail.com> wrote:
>
> I hope to get  some expert advice before I proceed. I don't want to
> make things worse. Here's my situation now:
>
> This problem is with an external USB drive and it is encrypted.
> cryptsetup open succeeds. But mount fails.k
>
>     mount /backup
>     mount: /backup: wrong fs type, bad option, bad superblock on
> /dev/mapper/xusbluks, missing codepage or helper program, or other
> error.
>
>  Next the following command succeeds:
>
>     mount -o ro,recovery /dev/mapper/xusbluks /backup
>
> This is my backup disk (5TB), and I don't have another 5TB disk to
> copy all the data to. I hope I can fix the issue without losing my
> backups.
>
> Next step I did:
>
>         # btrfs check /dev/mapper/xyz
>         Opening filesystem to check...
>         parent transid verify failed on 2853827608576 wanted 29436 found 29433
>         parent transid verify failed on 2853827608576 wanted 29436 found 29433
>         parent transid verify failed on 2853827608576 wanted 29436 found 29433
>         Ignoring transid failure
>         leaf parent key incorrect 2853827608576
>         ERROR: could not setup extent tree
>         ERROR: cannot open file system


>From your superblock:

        backup 2:
                backup_tree_root:       2853787942912   gen: 29433      level: 1

Do this:

btrfs check -r 2853787942912 /dev/xyz

If it comes up clean it's safe to do: mount -o usebackuproot, without
needing to use ro. And in that case it'll self recover. You will lose
some data, between the commits. It is possible there's partial loss,
so it's not enough to just do a scrub, you'll want to freshen the
backups as well - if that's what was happening at the time that the
trouble happened (the trouble causing the subsequent transid
failures).

Sometimes backup roots are already stale and inconsistent due to
overwrites, so the btrfs check might find problems with that older
root.

What you eventually need to look at is what precipitated the transid
failures, and avoid it. Typical is a drive firmware bug where it gets
write ordering wrong and then there's a crash or power fail. Possibly
one way to work around the bug is disabling the drive's write cache
(use a udev rule to make sure it's always applied). Another way is add
a different make/model drive to it, and convert to raid1 profile. And
hopefully they won't have overlapping firmware bugs.


-- 
Chris Murphy

Reply via email to