On Mon, Feb 22, 2021 at 2:34 PM Josef Bacik <jo...@toxicpanda.com> wrote:
>
> On 2/21/21 1:27 PM, Neal Gompa wrote:
> > On Wed, Feb 17, 2021 at 11:44 AM Josef Bacik <jo...@toxicpanda.com> wrote:
> >>
> >> On 2/17/21 11:29 AM, Neal Gompa wrote:
> >>> On Wed, Feb 17, 2021 at 9:59 AM Josef Bacik <jo...@toxicpanda.com> wrote:
> >>>>
> >>>> On 2/17/21 9:50 AM, Neal Gompa wrote:
> >>>>> On Wed, Feb 17, 2021 at 9:36 AM Josef Bacik <jo...@toxicpanda.com> 
> >>>>> wrote:
> >>>>>>
> >>>>>> On 2/16/21 9:05 PM, Neal Gompa wrote:
> >>>>>>> On Tue, Feb 16, 2021 at 4:24 PM Josef Bacik <jo...@toxicpanda.com> 
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> On 2/16/21 3:29 PM, Neal Gompa wrote:
> >>>>>>>>> On Tue, Feb 16, 2021 at 1:11 PM Josef Bacik <jo...@toxicpanda.com> 
> >>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> On 2/16/21 11:27 AM, Neal Gompa wrote:
> >>>>>>>>>>> On Tue, Feb 16, 2021 at 10:19 AM Josef Bacik 
> >>>>>>>>>>> <jo...@toxicpanda.com> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 2/14/21 3:25 PM, Neal Gompa wrote:
> >>>>>>>>>>>>> Hey all,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> So one of my main computers recently had a disk controller 
> >>>>>>>>>>>>> failure
> >>>>>>>>>>>>> that caused my machine to freeze. After rebooting, Btrfs 
> >>>>>>>>>>>>> refuses to
> >>>>>>>>>>>>> mount. I tried to do a mount and the following errors show up 
> >>>>>>>>>>>>> in the
> >>>>>>>>>>>>> journal:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS info (device 
> >>>>>>>>>>>>>> sda3): disk space caching is enabled
> >>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS info (device 
> >>>>>>>>>>>>>> sda3): has skinny extents
> >>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS critical (device 
> >>>>>>>>>>>>>> sda3): corrupt leaf: root=401 block=796082176 slot=15 
> >>>>>>>>>>>>>> ino=203657, invalid inode transid: has 888896 expect [0, 
> >>>>>>>>>>>>>> 888895]
> >>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS error (device 
> >>>>>>>>>>>>>> sda3): block=796082176 read time tree block corruption detected
> >>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS critical (device 
> >>>>>>>>>>>>>> sda3): corrupt leaf: root=401 block=796082176 slot=15 
> >>>>>>>>>>>>>> ino=203657, invalid inode transid: has 888896 expect [0, 
> >>>>>>>>>>>>>> 888895]
> >>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS error (device 
> >>>>>>>>>>>>>> sda3): block=796082176 read time tree block corruption detected
> >>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS warning (device 
> >>>>>>>>>>>>>> sda3): couldn't read tree root
> >>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS error (device 
> >>>>>>>>>>>>>> sda3): open_ctree failed
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I've tried to do -o recovery,ro mount and get the same issue. I 
> >>>>>>>>>>>>> can't
> >>>>>>>>>>>>> seem to find any reasonably good information on how to do 
> >>>>>>>>>>>>> recovery in
> >>>>>>>>>>>>> this scenario, even to just recover enough to copy data off.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I'm on Fedora 33, the system was on Linux kernel version 5.9.16 
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>> the Fedora 33 live ISO I'm using has Linux kernel version 
> >>>>>>>>>>>>> 5.10.14. I'm
> >>>>>>>>>>>>> using btrfs-progs v5.10.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Can anyone help?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Can you try
> >>>>>>>>>>>>
> >>>>>>>>>>>> btrfs check --clear-space-cache v1 /dev/whatever
> >>>>>>>>>>>>
> >>>>>>>>>>>> That should fix the inode generation thing so it's sane, and 
> >>>>>>>>>>>> then the tree
> >>>>>>>>>>>> checker will allow the fs to be read, hopefully.  If not we can 
> >>>>>>>>>>>> work out some
> >>>>>>>>>>>> other magic.  Thanks,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Josef
> >>>>>>>>>>>
> >>>>>>>>>>> I got the same error as I did with btrfs-check --readonly...
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Oh lovely, what does btrfs check --readonly --backup do?
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> No dice...
> >>>>>>>>>
> >>>>>>>>> # btrfs check --readonly --backup /dev/sda3
> >>>>>>>>>> Opening filesystem to check...
> >>>>>>>>>> parent transid verify failed on 791281664 wanted 888893 found 
> >>>>>>>>>> 888895
> >>>>>>>>>> parent transid verify failed on 791281664 wanted 888893 found 
> >>>>>>>>>> 888895
> >>>>>>>>>> parent transid verify failed on 791281664 wanted 888893 found 
> >>>>>>>>>> 888895
> >>>>>>>>
> >>>>>>>> Hey look the block we're looking for, I wrote you some magic, just 
> >>>>>>>> pull
> >>>>>>>>
> >>>>>>>> https://github.com/josefbacik/btrfs-progs/tree/for-neal
> >>>>>>>>
> >>>>>>>> build, and then run
> >>>>>>>>
> >>>>>>>> btrfs-neal-magic /dev/sda3 791281664 888895
> >>>>>>>>
> >>>>>>>> This will force us to point at the old root with (hopefully) the 
> >>>>>>>> right bytenr
> >>>>>>>> and gen, and then hopefully you'll be able to recover from there.  
> >>>>>>>> This is kind
> >>>>>>>> of saucy, so yolo, but I can undo it if it makes things worse.  
> >>>>>>>> Thanks,
> >>>>>>>>
> >>>>>>>
> >>>>>>> # btrfs check --readonly /dev/sda3
> >>>>>>>> Opening filesystem to check...
> >>>>>>>> ERROR: could not setup extent tree
> >>>>>>>> ERROR: cannot open file system
> >>>>>>> # btrfs check --clear-space-cache v1 /dev/sda3
> >>>>>>>> Opening filesystem to check...
> >>>>>>>> ERROR: could not setup extent tree
> >>>>>>>> ERROR: cannot open file system
> >>>>>>>
> >>>>>>> It's better, but still no dice... :(
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>> Hmm it's not telling us what's wrong with the extent tree, which is 
> >>>>>> annoying.
> >>>>>> Does mount -o rescue=all,ro work now that the root tree is normal?  
> >>>>>> Thanks,
> >>>>>>
> >>>>>
> >>>>> Nope, I see this in the journal:
> >>>>>
> >>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): 
> >>>>>> enabling all of the rescue options
> >>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): 
> >>>>>> ignoring data csums
> >>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): 
> >>>>>> ignoring bad roots
> >>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): 
> >>>>>> disabling log replay at mount time
> >>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): disk 
> >>>>>> space caching is enabled
> >>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): has 
> >>>>>> skinny extents
> >>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS error (device sda3): tree 
> >>>>>> level mismatch detected, bytenr=791281664 level expected=1 has=2
> >>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS error (device sda3): tree 
> >>>>>> level mismatch detected, bytenr=791281664 level expected=1 has=2
> >>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS warning (device sda3): 
> >>>>>> couldn't read tree root
> >>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS error (device sda3): 
> >>>>>> open_ctree failed
> >>>>>
> >>>>>
> >>>>
> >>>> Ok git pull for-neal, rebuild, then run
> >>>>
> >>>> btrfs-neal-magic /dev/sda3 791281664 888895 2
> >>>>
> >>>> I thought of this yesterday but in my head was like "naaahhhh, whats the 
> >>>> chances
> >>>> that the level doesn't match??".  Thanks,
> >>>>
> >>>
> >>> Tried rescue mount again after running that and got a stack trace in
> >>> the kernel, detailed in the following attached log.
> >>
> >> Huh I wonder how I didn't hit this when testing, I must have only tested 
> >> with
> >> zero'ing the extent root and the csum root.  You're going to have to build 
> >> a
> >> kernel with a fix for this
> >>
> >> https://paste.centos.org/view/7b48aaea
> >>
> >> and see if that gets you further.  Thanks,
> >>
> >
> > I built a kernel build as an RPM with your patch[1] and tried it.
> >
> > [root@fedora ~]# mount -t btrfs -o rescue=all,ro /dev/sdb3 /mnt
> > Killed
> >
> > The log from the journal is attached.
>
>
> Ahh crud my bad, this should do it
>
> https://paste.centos.org/view/ac2e61ef
>

Patch doesn't apply (note it is patch 667 below):

ngompa@fedora-rawhide-skuldvm ~/f/kernel (rawhide)> fedpkg prep

setting SOURCE_DATE_EPOCH=1613347200
Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.f5LOt6
+ umask 022
+ cd /home/ngompa/fedora-scm/kernel
+ patch_command='patch -p1 -F1 -s'
+ cd /home/ngompa/fedora-scm/kernel
+ rm -rf kernel-5.11
+ /usr/bin/mkdir -p kernel-5.11
+ cd kernel-5.11
+ /usr/bin/xz -dc /home/ngompa/fedora-scm/kernel/linux-5.11.tar.xz
+ /usr/bin/tar -xof -
+ STATUS=0
+ '[' 0 -ne 0 ']'
+ /usr/bin/chmod -Rf a+rX,u+w,g-w,o-w .
+ mv linux-5.11 linux-5.11.0-155.nealbtrfstest.1.fc35.x86_64
+ cd linux-5.11.0-155.nealbtrfstest.1.fc35.x86_64
+ cp -a /home/ngompa/fedora-scm/kernel/Makefile.rhelver .
+ ApplyOptionalPatch patch-5.11.0-redhat.patch
+ local patch=patch-5.11.0-redhat.patch
+ shift
+ '[' '!' -f /home/ngompa/fedora-scm/kernel/patch-5.11.0-redhat.patch ']'
++ awk '{print $1}'
++ wc -l /home/ngompa/fedora-scm/kernel/patch-5.11.0-redhat.patch
+ local C=3166
+ '[' 3166 -gt 9 ']'
+ ApplyPatch patch-5.11.0-redhat.patch
+ local patch=patch-5.11.0-redhat.patch
+ shift
+ '[' '!' -f /home/ngompa/fedora-scm/kernel/patch-5.11.0-redhat.patch ']'
+ case "$patch" in
+ patch -p1 -F1 -s
+ echo 'Patch #666 (linux-5.11-btrfs-handle-null-roots.diff):'
Patch #666 (linux-5.11-btrfs-handle-null-roots.diff):
+ /usr/bin/patch --no-backup-if-mismatch -p1 --fuzz=0
patching file fs/btrfs/ctree.c
Hunk #1 succeeded at 2594 (offset -4 lines).
patching file fs/btrfs/volumes.c
Hunk #1 succeeded at 7282 (offset -166 lines).
+ echo 'Patch #667 (linux-5.11-btrfs-init-devices-more-gracefully.diff):'
Patch #667 (linux-5.11-btrfs-init-devices-more-gracefully.diff):
+ /usr/bin/patch --no-backup-if-mismatch -p1 --fuzz=0
patching file fs/btrfs/disk-io.c
patching file fs/btrfs/volumes.c
Hunk #1 FAILED at 7282.
1 out of 1 hunk FAILED -- saving rejects to file fs/btrfs/volumes.c.rej
error: Bad exit status from /var/tmp/rpm-tmp.f5LOt6 (%prep)


RPM build errors:
   Bad exit status from /var/tmp/rpm-tmp.f5LOt6 (%prep)
Could not execute prep: Failed to execute command.





-- 
真実はいつも一つ!/ Always, there's only one truth!

Reply via email to