On Mon, Feb 22, 2021 at 2:34 PM Josef Bacik <jo...@toxicpanda.com> wrote: > > On 2/21/21 1:27 PM, Neal Gompa wrote: > > On Wed, Feb 17, 2021 at 11:44 AM Josef Bacik <jo...@toxicpanda.com> wrote: > >> > >> On 2/17/21 11:29 AM, Neal Gompa wrote: > >>> On Wed, Feb 17, 2021 at 9:59 AM Josef Bacik <jo...@toxicpanda.com> wrote: > >>>> > >>>> On 2/17/21 9:50 AM, Neal Gompa wrote: > >>>>> On Wed, Feb 17, 2021 at 9:36 AM Josef Bacik <jo...@toxicpanda.com> > >>>>> wrote: > >>>>>> > >>>>>> On 2/16/21 9:05 PM, Neal Gompa wrote: > >>>>>>> On Tue, Feb 16, 2021 at 4:24 PM Josef Bacik <jo...@toxicpanda.com> > >>>>>>> wrote: > >>>>>>>> > >>>>>>>> On 2/16/21 3:29 PM, Neal Gompa wrote: > >>>>>>>>> On Tue, Feb 16, 2021 at 1:11 PM Josef Bacik <jo...@toxicpanda.com> > >>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>> On 2/16/21 11:27 AM, Neal Gompa wrote: > >>>>>>>>>>> On Tue, Feb 16, 2021 at 10:19 AM Josef Bacik > >>>>>>>>>>> <jo...@toxicpanda.com> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> On 2/14/21 3:25 PM, Neal Gompa wrote: > >>>>>>>>>>>>> Hey all, > >>>>>>>>>>>>> > >>>>>>>>>>>>> So one of my main computers recently had a disk controller > >>>>>>>>>>>>> failure > >>>>>>>>>>>>> that caused my machine to freeze. After rebooting, Btrfs > >>>>>>>>>>>>> refuses to > >>>>>>>>>>>>> mount. I tried to do a mount and the following errors show up > >>>>>>>>>>>>> in the > >>>>>>>>>>>>> journal: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS info (device > >>>>>>>>>>>>>> sda3): disk space caching is enabled > >>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS info (device > >>>>>>>>>>>>>> sda3): has skinny extents > >>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS critical (device > >>>>>>>>>>>>>> sda3): corrupt leaf: root=401 block=796082176 slot=15 > >>>>>>>>>>>>>> ino=203657, invalid inode transid: has 888896 expect [0, > >>>>>>>>>>>>>> 888895] > >>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS error (device > >>>>>>>>>>>>>> sda3): block=796082176 read time tree block corruption detected > >>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS critical (device > >>>>>>>>>>>>>> sda3): corrupt leaf: root=401 block=796082176 slot=15 > >>>>>>>>>>>>>> ino=203657, invalid inode transid: has 888896 expect [0, > >>>>>>>>>>>>>> 888895] > >>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS error (device > >>>>>>>>>>>>>> sda3): block=796082176 read time tree block corruption detected > >>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS warning (device > >>>>>>>>>>>>>> sda3): couldn't read tree root > >>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS error (device > >>>>>>>>>>>>>> sda3): open_ctree failed > >>>>>>>>>>>>> > >>>>>>>>>>>>> I've tried to do -o recovery,ro mount and get the same issue. I > >>>>>>>>>>>>> can't > >>>>>>>>>>>>> seem to find any reasonably good information on how to do > >>>>>>>>>>>>> recovery in > >>>>>>>>>>>>> this scenario, even to just recover enough to copy data off. > >>>>>>>>>>>>> > >>>>>>>>>>>>> I'm on Fedora 33, the system was on Linux kernel version 5.9.16 > >>>>>>>>>>>>> and > >>>>>>>>>>>>> the Fedora 33 live ISO I'm using has Linux kernel version > >>>>>>>>>>>>> 5.10.14. I'm > >>>>>>>>>>>>> using btrfs-progs v5.10. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Can anyone help? > >>>>>>>>>>>> > >>>>>>>>>>>> Can you try > >>>>>>>>>>>> > >>>>>>>>>>>> btrfs check --clear-space-cache v1 /dev/whatever > >>>>>>>>>>>> > >>>>>>>>>>>> That should fix the inode generation thing so it's sane, and > >>>>>>>>>>>> then the tree > >>>>>>>>>>>> checker will allow the fs to be read, hopefully. If not we can > >>>>>>>>>>>> work out some > >>>>>>>>>>>> other magic. Thanks, > >>>>>>>>>>>> > >>>>>>>>>>>> Josef > >>>>>>>>>>> > >>>>>>>>>>> I got the same error as I did with btrfs-check --readonly... > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Oh lovely, what does btrfs check --readonly --backup do? > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> No dice... > >>>>>>>>> > >>>>>>>>> # btrfs check --readonly --backup /dev/sda3 > >>>>>>>>>> Opening filesystem to check... > >>>>>>>>>> parent transid verify failed on 791281664 wanted 888893 found > >>>>>>>>>> 888895 > >>>>>>>>>> parent transid verify failed on 791281664 wanted 888893 found > >>>>>>>>>> 888895 > >>>>>>>>>> parent transid verify failed on 791281664 wanted 888893 found > >>>>>>>>>> 888895 > >>>>>>>> > >>>>>>>> Hey look the block we're looking for, I wrote you some magic, just > >>>>>>>> pull > >>>>>>>> > >>>>>>>> https://github.com/josefbacik/btrfs-progs/tree/for-neal > >>>>>>>> > >>>>>>>> build, and then run > >>>>>>>> > >>>>>>>> btrfs-neal-magic /dev/sda3 791281664 888895 > >>>>>>>> > >>>>>>>> This will force us to point at the old root with (hopefully) the > >>>>>>>> right bytenr > >>>>>>>> and gen, and then hopefully you'll be able to recover from there. > >>>>>>>> This is kind > >>>>>>>> of saucy, so yolo, but I can undo it if it makes things worse. > >>>>>>>> Thanks, > >>>>>>>> > >>>>>>> > >>>>>>> # btrfs check --readonly /dev/sda3 > >>>>>>>> Opening filesystem to check... > >>>>>>>> ERROR: could not setup extent tree > >>>>>>>> ERROR: cannot open file system > >>>>>>> # btrfs check --clear-space-cache v1 /dev/sda3 > >>>>>>>> Opening filesystem to check... > >>>>>>>> ERROR: could not setup extent tree > >>>>>>>> ERROR: cannot open file system > >>>>>>> > >>>>>>> It's better, but still no dice... :( > >>>>>>> > >>>>>>> > >>>>>> > >>>>>> Hmm it's not telling us what's wrong with the extent tree, which is > >>>>>> annoying. > >>>>>> Does mount -o rescue=all,ro work now that the root tree is normal? > >>>>>> Thanks, > >>>>>> > >>>>> > >>>>> Nope, I see this in the journal: > >>>>> > >>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): > >>>>>> enabling all of the rescue options > >>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): > >>>>>> ignoring data csums > >>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): > >>>>>> ignoring bad roots > >>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): > >>>>>> disabling log replay at mount time > >>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): disk > >>>>>> space caching is enabled > >>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): has > >>>>>> skinny extents > >>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS error (device sda3): tree > >>>>>> level mismatch detected, bytenr=791281664 level expected=1 has=2 > >>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS error (device sda3): tree > >>>>>> level mismatch detected, bytenr=791281664 level expected=1 has=2 > >>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS warning (device sda3): > >>>>>> couldn't read tree root > >>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS error (device sda3): > >>>>>> open_ctree failed > >>>>> > >>>>> > >>>> > >>>> Ok git pull for-neal, rebuild, then run > >>>> > >>>> btrfs-neal-magic /dev/sda3 791281664 888895 2 > >>>> > >>>> I thought of this yesterday but in my head was like "naaahhhh, whats the > >>>> chances > >>>> that the level doesn't match??". Thanks, > >>>> > >>> > >>> Tried rescue mount again after running that and got a stack trace in > >>> the kernel, detailed in the following attached log. > >> > >> Huh I wonder how I didn't hit this when testing, I must have only tested > >> with > >> zero'ing the extent root and the csum root. You're going to have to build > >> a > >> kernel with a fix for this > >> > >> https://paste.centos.org/view/7b48aaea > >> > >> and see if that gets you further. Thanks, > >> > > > > I built a kernel build as an RPM with your patch[1] and tried it. > > > > [root@fedora ~]# mount -t btrfs -o rescue=all,ro /dev/sdb3 /mnt > > Killed > > > > The log from the journal is attached. > > > Ahh crud my bad, this should do it > > https://paste.centos.org/view/ac2e61ef >
Patch doesn't apply (note it is patch 667 below): ngompa@fedora-rawhide-skuldvm ~/f/kernel (rawhide)> fedpkg prep setting SOURCE_DATE_EPOCH=1613347200 Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.f5LOt6 + umask 022 + cd /home/ngompa/fedora-scm/kernel + patch_command='patch -p1 -F1 -s' + cd /home/ngompa/fedora-scm/kernel + rm -rf kernel-5.11 + /usr/bin/mkdir -p kernel-5.11 + cd kernel-5.11 + /usr/bin/xz -dc /home/ngompa/fedora-scm/kernel/linux-5.11.tar.xz + /usr/bin/tar -xof - + STATUS=0 + '[' 0 -ne 0 ']' + /usr/bin/chmod -Rf a+rX,u+w,g-w,o-w . + mv linux-5.11 linux-5.11.0-155.nealbtrfstest.1.fc35.x86_64 + cd linux-5.11.0-155.nealbtrfstest.1.fc35.x86_64 + cp -a /home/ngompa/fedora-scm/kernel/Makefile.rhelver . + ApplyOptionalPatch patch-5.11.0-redhat.patch + local patch=patch-5.11.0-redhat.patch + shift + '[' '!' -f /home/ngompa/fedora-scm/kernel/patch-5.11.0-redhat.patch ']' ++ awk '{print $1}' ++ wc -l /home/ngompa/fedora-scm/kernel/patch-5.11.0-redhat.patch + local C=3166 + '[' 3166 -gt 9 ']' + ApplyPatch patch-5.11.0-redhat.patch + local patch=patch-5.11.0-redhat.patch + shift + '[' '!' -f /home/ngompa/fedora-scm/kernel/patch-5.11.0-redhat.patch ']' + case "$patch" in + patch -p1 -F1 -s + echo 'Patch #666 (linux-5.11-btrfs-handle-null-roots.diff):' Patch #666 (linux-5.11-btrfs-handle-null-roots.diff): + /usr/bin/patch --no-backup-if-mismatch -p1 --fuzz=0 patching file fs/btrfs/ctree.c Hunk #1 succeeded at 2594 (offset -4 lines). patching file fs/btrfs/volumes.c Hunk #1 succeeded at 7282 (offset -166 lines). + echo 'Patch #667 (linux-5.11-btrfs-init-devices-more-gracefully.diff):' Patch #667 (linux-5.11-btrfs-init-devices-more-gracefully.diff): + /usr/bin/patch --no-backup-if-mismatch -p1 --fuzz=0 patching file fs/btrfs/disk-io.c patching file fs/btrfs/volumes.c Hunk #1 FAILED at 7282. 1 out of 1 hunk FAILED -- saving rejects to file fs/btrfs/volumes.c.rej error: Bad exit status from /var/tmp/rpm-tmp.f5LOt6 (%prep) RPM build errors: Bad exit status from /var/tmp/rpm-tmp.f5LOt6 (%prep) Could not execute prep: Failed to execute command. -- 真実はいつも一つ!/ Always, there's only one truth!