On Mon, Mar 22, 2021 at 12:32 AM Dave T <davestechs...@gmail.com> wrote:
>
> On Sun, Mar 21, 2021 at 2:03 PM Chris Murphy <li...@colorremedies.com> wrote:
> >
> > On Sat, Mar 20, 2021 at 11:54 PM Dave T <davestechs...@gmail.com> wrote:
> > >
> > > # btrfs check -r 2853787942912 /dev/mapper/xyz
> > > Opening filesystem to check...
> > > parent transid verify failed on 2853787942912 wanted 29436 found 29433
> > > parent transid verify failed on 2853787942912 wanted 29436 found 29433
> > > parent transid verify failed on 2853787942912 wanted 29436 found 29433
> > > Ignoring transid failure
> > > parent transid verify failed on 2853827723264 wanted 29433 found 29435
> > > parent transid verify failed on 2853827723264 wanted 29433 found 29435
> > > parent transid verify failed on 2853827723264 wanted 29433 found 29435
> > > Ignoring transid failure
> > > leaf parent key incorrect 2853827723264
> > > ERROR: could not setup extent tree
> > > ERROR: cannot open file system
> >
> > btrfs insp dump-t -t 2853827723264 /dev/
>
> # btrfs insp dump-t -t 2853827723264 /dev/mapper/xzy
> btrfs-progs v5.11
> parent transid verify failed on 2853827608576 wanted 29436 found 29433
> parent transid verify failed on 2853827608576 wanted 29436 found 29433
> parent transid verify failed on 2853827608576 wanted 29436 found 29433
> Ignoring transid failure
> leaf parent key incorrect 2853827608576
> WARNING: could not setup extent tree, skipping it
> parent transid verify failed on 2853827608576 wanted 29436 found 29433
> Ignoring transid failure
> leaf parent key incorrect 2853827608576
> Couldn't setup device tree
> ERROR: unable to open /dev/mapper/xzy
>
> # btrfs insp dump-t -t 2853787942912 /dev/mapper/xzy
> btrfs-progs v5.11
> parent transid verify failed on 2853827608576 wanted 29436 found 29433
> parent transid verify failed on 2853827608576 wanted 29436 found 29433
> parent transid verify failed on 2853827608576 wanted 29436 found 29433
> Ignoring transid failure
> leaf parent key incorrect 2853827608576
> WARNING: could not setup extent tree, skipping it
> parent transid verify failed on 2853827608576 wanted 29436 found 29433
> Ignoring transid failure
> leaf parent key incorrect 2853827608576
> Couldn't setup device tree
> ERROR: unable to open /dev/mapper/xzy
>
> # btrfs insp dump-t -t 2853827608576 /dev/mapper/xzy
> btrfs-progs v5.11
> parent transid verify failed on 2853827608576 wanted 29436 found 29433
> parent transid verify failed on 2853827608576 wanted 29436 found 29433
> parent transid verify failed on 2853827608576 wanted 29436 found 29433
> Ignoring transid failure
> leaf parent key incorrect 2853827608576
> WARNING: could not setup extent tree, skipping it
> parent transid verify failed on 2853827608576 wanted 29436 found 29433
> Ignoring transid failure
> leaf parent key incorrect 2853827608576
> Couldn't setup device tree
> ERROR: unable to open /dev/mapper/xzy

That does not look promising. I don't know whether a read-write mount
with usebackuproot will recover, or end up with problems.

Options:

a. btrfs check --repair
This probably fails on the same problem, it can't setup the extent tree.

b. btrfs check --init-extent-tree
This is a heavy hammer, it might succeed, but takes a long time. On 5T
it might take double digit hours or even single digit days. It's
generally faster to just wipe the drive and restore from backups than
use init-extent-tree (I understand this *is* your backup).

c. Setup an overlay file on device mapper, to redirect the writes from
a read-write mount with usebackup root. I think it's sufficient to
just mount, optionally write some files (empty or not), and umount.
Then do a btrfs check to see if the current tree is healthy.
https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file

That guide is a bit complex to deal with many drives with mdadm raid,
so you can simplify it for just one drive. The gist is no writes go to
the drive itself, it's treated as read-only by device-mapper (in fact
you can optionally add a pre-step with the blockdev command and
--setro to make sure the entire drive is read-only; just make sure to
make it rw once you're done testing). All the writes with this overlay
go into a loop mounted file which you intentionally just throw away
after testing.

d. Just skip the testing and try usebackuproot with a read-write
mount. It might make things worse, but at least it's fast to test. If
it messes things up, you'll have to recreate this backup from scratch.

As for how to prevent this? I'm not sure. About the best we can do is
disable the drive write cache with a udev rule, and/or raid1 with
another make/model drive, and let Btrfs detect occasional corruption
and self heal from the good copy. Another obvious way to avoid the
problem is, stop having power failures, crashes, and accidental USB
cable disconnections :)

It's not any one thing that's the problem. It's a sequence of problems
happening in just the right (or wrong) order that causes the problem.
Bugs + mistake + bad luck = problem.

-- 
Chris Murphy

Reply via email to