On Wed, Feb 17, 2021 at 11:44 AM Josef Bacik <jo...@toxicpanda.com> wrote: > > On 2/17/21 11:29 AM, Neal Gompa wrote: > > On Wed, Feb 17, 2021 at 9:59 AM Josef Bacik <jo...@toxicpanda.com> wrote: > >> > >> On 2/17/21 9:50 AM, Neal Gompa wrote: > >>> On Wed, Feb 17, 2021 at 9:36 AM Josef Bacik <jo...@toxicpanda.com> wrote: > >>>> > >>>> On 2/16/21 9:05 PM, Neal Gompa wrote: > >>>>> On Tue, Feb 16, 2021 at 4:24 PM Josef Bacik <jo...@toxicpanda.com> > >>>>> wrote: > >>>>>> > >>>>>> On 2/16/21 3:29 PM, Neal Gompa wrote: > >>>>>>> On Tue, Feb 16, 2021 at 1:11 PM Josef Bacik <jo...@toxicpanda.com> > >>>>>>> wrote: > >>>>>>>> > >>>>>>>> On 2/16/21 11:27 AM, Neal Gompa wrote: > >>>>>>>>> On Tue, Feb 16, 2021 at 10:19 AM Josef Bacik <jo...@toxicpanda.com> > >>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>> On 2/14/21 3:25 PM, Neal Gompa wrote: > >>>>>>>>>>> Hey all, > >>>>>>>>>>> > >>>>>>>>>>> So one of my main computers recently had a disk controller failure > >>>>>>>>>>> that caused my machine to freeze. After rebooting, Btrfs refuses > >>>>>>>>>>> to > >>>>>>>>>>> mount. I tried to do a mount and the following errors show up in > >>>>>>>>>>> the > >>>>>>>>>>> journal: > >>>>>>>>>>> > >>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS info (device sda3): > >>>>>>>>>>>> disk space caching is enabled > >>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS info (device sda3): > >>>>>>>>>>>> has skinny extents > >>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS critical (device > >>>>>>>>>>>> sda3): corrupt leaf: root=401 block=796082176 slot=15 > >>>>>>>>>>>> ino=203657, invalid inode transid: has 888896 expect [0, 888895] > >>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS error (device > >>>>>>>>>>>> sda3): block=796082176 read time tree block corruption detected > >>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS critical (device > >>>>>>>>>>>> sda3): corrupt leaf: root=401 block=796082176 slot=15 > >>>>>>>>>>>> ino=203657, invalid inode transid: has 888896 expect [0, 888895] > >>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS error (device > >>>>>>>>>>>> sda3): block=796082176 read time tree block corruption detected > >>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS warning (device > >>>>>>>>>>>> sda3): couldn't read tree root > >>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS error (device > >>>>>>>>>>>> sda3): open_ctree failed > >>>>>>>>>>> > >>>>>>>>>>> I've tried to do -o recovery,ro mount and get the same issue. I > >>>>>>>>>>> can't > >>>>>>>>>>> seem to find any reasonably good information on how to do > >>>>>>>>>>> recovery in > >>>>>>>>>>> this scenario, even to just recover enough to copy data off. > >>>>>>>>>>> > >>>>>>>>>>> I'm on Fedora 33, the system was on Linux kernel version 5.9.16 > >>>>>>>>>>> and > >>>>>>>>>>> the Fedora 33 live ISO I'm using has Linux kernel version > >>>>>>>>>>> 5.10.14. I'm > >>>>>>>>>>> using btrfs-progs v5.10. > >>>>>>>>>>> > >>>>>>>>>>> Can anyone help? > >>>>>>>>>> > >>>>>>>>>> Can you try > >>>>>>>>>> > >>>>>>>>>> btrfs check --clear-space-cache v1 /dev/whatever > >>>>>>>>>> > >>>>>>>>>> That should fix the inode generation thing so it's sane, and then > >>>>>>>>>> the tree > >>>>>>>>>> checker will allow the fs to be read, hopefully. If not we can > >>>>>>>>>> work out some > >>>>>>>>>> other magic. Thanks, > >>>>>>>>>> > >>>>>>>>>> Josef > >>>>>>>>> > >>>>>>>>> I got the same error as I did with btrfs-check --readonly... > >>>>>>>>> > >>>>>>>> > >>>>>>>> Oh lovely, what does btrfs check --readonly --backup do? > >>>>>>>> > >>>>>>> > >>>>>>> No dice... > >>>>>>> > >>>>>>> # btrfs check --readonly --backup /dev/sda3 > >>>>>>>> Opening filesystem to check... > >>>>>>>> parent transid verify failed on 791281664 wanted 888893 found 888895 > >>>>>>>> parent transid verify failed on 791281664 wanted 888893 found 888895 > >>>>>>>> parent transid verify failed on 791281664 wanted 888893 found 888895 > >>>>>> > >>>>>> Hey look the block we're looking for, I wrote you some magic, just pull > >>>>>> > >>>>>> https://github.com/josefbacik/btrfs-progs/tree/for-neal > >>>>>> > >>>>>> build, and then run > >>>>>> > >>>>>> btrfs-neal-magic /dev/sda3 791281664 888895 > >>>>>> > >>>>>> This will force us to point at the old root with (hopefully) the right > >>>>>> bytenr > >>>>>> and gen, and then hopefully you'll be able to recover from there. > >>>>>> This is kind > >>>>>> of saucy, so yolo, but I can undo it if it makes things worse. Thanks, > >>>>>> > >>>>> > >>>>> # btrfs check --readonly /dev/sda3 > >>>>>> Opening filesystem to check... > >>>>>> ERROR: could not setup extent tree > >>>>>> ERROR: cannot open file system > >>>>> # btrfs check --clear-space-cache v1 /dev/sda3 > >>>>>> Opening filesystem to check... > >>>>>> ERROR: could not setup extent tree > >>>>>> ERROR: cannot open file system > >>>>> > >>>>> It's better, but still no dice... :( > >>>>> > >>>>> > >>>> > >>>> Hmm it's not telling us what's wrong with the extent tree, which is > >>>> annoying. > >>>> Does mount -o rescue=all,ro work now that the root tree is normal? > >>>> Thanks, > >>>> > >>> > >>> Nope, I see this in the journal: > >>> > >>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): > >>>> enabling all of the rescue options > >>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): > >>>> ignoring data csums > >>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): > >>>> ignoring bad roots > >>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): > >>>> disabling log replay at mount time > >>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): disk > >>>> space caching is enabled > >>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): has > >>>> skinny extents > >>>> Feb 17 09:49:40 localhost-live kernel: BTRFS error (device sda3): tree > >>>> level mismatch detected, bytenr=791281664 level expected=1 has=2 > >>>> Feb 17 09:49:40 localhost-live kernel: BTRFS error (device sda3): tree > >>>> level mismatch detected, bytenr=791281664 level expected=1 has=2 > >>>> Feb 17 09:49:40 localhost-live kernel: BTRFS warning (device sda3): > >>>> couldn't read tree root > >>>> Feb 17 09:49:40 localhost-live kernel: BTRFS error (device sda3): > >>>> open_ctree failed > >>> > >>> > >> > >> Ok git pull for-neal, rebuild, then run > >> > >> btrfs-neal-magic /dev/sda3 791281664 888895 2 > >> > >> I thought of this yesterday but in my head was like "naaahhhh, whats the > >> chances > >> that the level doesn't match??". Thanks, > >> > > > > Tried rescue mount again after running that and got a stack trace in > > the kernel, detailed in the following attached log. > > Huh I wonder how I didn't hit this when testing, I must have only tested with > zero'ing the extent root and the csum root. You're going to have to build a > kernel with a fix for this > > https://paste.centos.org/view/7b48aaea > > and see if that gets you further. Thanks, >
I built a kernel build as an RPM with your patch[1] and tried it. [root@fedora ~]# mount -t btrfs -o rescue=all,ro /dev/sdb3 /mnt Killed The log from the journal is attached. [1]: https://download.copr.fedorainfracloud.org/results/ngompa/btrfs-progs-neal-magic/fedora-34-x86_64/01987802-kernel/ -- 真実はいつも一つ!/ Always, there's only one truth!
Feb 21 13:18:58 fedora kernel: BTRFS info (device sdb3): enabling all of the rescue options Feb 21 13:18:58 fedora kernel: BTRFS info (device sdb3): ignoring data csums Feb 21 13:18:58 fedora kernel: BTRFS info (device sdb3): ignoring bad roots Feb 21 13:18:58 fedora kernel: BTRFS info (device sdb3): disabling log replay at mount time Feb 21 13:18:58 fedora kernel: BTRFS info (device sdb3): disk space caching is enabled Feb 21 13:18:58 fedora kernel: BTRFS info (device sdb3): has skinny extents Feb 21 13:18:58 fedora kernel: BUG: kernel NULL pointer dereference, address: 0000000000000030 Feb 21 13:18:58 fedora kernel: #PF: supervisor read access in kernel mode Feb 21 13:18:58 fedora kernel: #PF: error_code(0x0000) - not-present page Feb 21 13:18:58 fedora kernel: PGD 0 P4D 0 Feb 21 13:18:58 fedora kernel: Oops: 0000 [#1] SMP PTI Feb 21 13:18:58 fedora kernel: CPU: 1 PID: 1590 Comm: mount Not tainted 5.11.0-155.nealbtrfstest.fc34.x86_64 #1 Feb 21 13:18:58 fedora kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/22/2020 Feb 21 13:18:58 fedora kernel: RIP: 0010:btrfs_device_init_dev_stats+0x26/0x210 Feb 21 13:18:58 fedora kernel: Code: 0f 1f 40 00 0f 1f 44 00 00 41 57 49 89 f7 41 56 41 55 45 31 ed 41 54 55 53 48 83 ec 40 48 8b 47 38 48 c7 44 24 2f 00 00 00 00 <48> 8b 70 30 c6 44 24 3f 00 48 c7 44 24 37 00 00 00 00 48 85 f6 74 Feb 21 13:18:58 fedora kernel: RSP: 0018:ffffb477c3ce3b68 EFLAGS: 00010282 Feb 21 13:18:58 fedora kernel: RAX: 0000000000000000 RBX: ffff8ad3094ea098 RCX: 0000000000000070 Feb 21 13:18:58 fedora kernel: RDX: ffff8ad31b728000 RSI: ffff8ad328c8c2a0 RDI: ffff8ad3094eac00 Feb 21 13:18:58 fedora kernel: RBP: ffff8ad328c8c2a0 R08: 0000000000000070 R09: 0000000000000000 Feb 21 13:18:58 fedora kernel: R10: ffff8ad328c8c2a0 R11: 0000000000000000 R12: ffff8ad3094ea000 Feb 21 13:18:58 fedora kernel: R13: 0000000000000000 R14: ffff8ad3094eac00 R15: ffff8ad328c8c2a0 Feb 21 13:18:58 fedora kernel: FS: 00007fda4979fc40(0000) GS:ffff8ad37be40000(0000) knlGS:0000000000000000 Feb 21 13:18:58 fedora kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 21 13:18:58 fedora kernel: CR2: 0000000000000030 CR3: 000000005faca001 CR4: 00000000003706e0 Feb 21 13:18:58 fedora kernel: Call Trace: Feb 21 13:18:58 fedora kernel: ? btrfs_init_dev_stats+0x1f/0xf0 Feb 21 13:18:58 fedora kernel: btrfs_init_dev_stats+0x62/0xf0 Feb 21 13:18:58 fedora kernel: open_ctree+0x102c/0x1610 Feb 21 13:18:58 fedora kernel: btrfs_mount_root.cold+0x13/0xfa Feb 21 13:18:58 fedora kernel: legacy_get_tree+0x27/0x40 Feb 21 13:18:58 fedora kernel: vfs_get_tree+0x25/0xb0 Feb 21 13:18:58 fedora kernel: vfs_kern_mount.part.0+0x71/0xb0 Feb 21 13:18:58 fedora kernel: btrfs_mount+0x131/0x3d0 Feb 21 13:18:58 fedora kernel: ? legacy_get_tree+0x27/0x40 Feb 21 13:18:58 fedora kernel: ? btrfs_show_options+0x640/0x640 Feb 21 13:18:58 fedora kernel: legacy_get_tree+0x27/0x40 Feb 21 13:18:58 fedora kernel: vfs_get_tree+0x25/0xb0 Feb 21 13:18:58 fedora kernel: path_mount+0x441/0xa80 Feb 21 13:18:58 fedora kernel: __x64_sys_mount+0xf4/0x130 Feb 21 13:18:58 fedora kernel: do_syscall_64+0x33/0x40 Feb 21 13:18:58 fedora kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Feb 21 13:18:58 fedora kernel: RIP: 0033:0x7fda499cf52e Feb 21 13:18:58 fedora kernel: Code: 48 8b 0d 45 19 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 12 19 0c 00 f7 d8 64 89 01 48 Feb 21 13:18:58 fedora kernel: RSP: 002b:00007ffc12dff688 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 Feb 21 13:18:58 fedora kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fda499cf52e Feb 21 13:18:58 fedora kernel: RDX: 000055e2fc6f3690 RSI: 000055e2fc6f3730 RDI: 000055e2fc6f36b0 Feb 21 13:18:58 fedora kernel: RBP: 000055e2fc6f3460 R08: 000055e2fc6f36f0 R09: 00007fda49a91a60 Feb 21 13:18:58 fedora kernel: R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000 Feb 21 13:18:58 fedora kernel: R13: 000055e2fc6f36b0 R14: 000055e2fc6f3690 R15: 000055e2fc6f3460 Feb 21 13:18:58 fedora kernel: Modules linked in: bnep snd_seq_dummy snd_hrtimer bluetooth ecdh_generic ecc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nf_tables rfkill nfnetlink ip6table_filter ip6_tables iptable_filter vsock_loopback vmw_vsock_virtio_transport_common snd_seq_midi snd_seq_midi_event vmw_vsock_vmci_transport vsock sunrpc intel_rapl_msr intel_rapl_common rapl vmw_balloon snd_ens1371 snd_ac97_codec ac97_bus snd_rawmidi snd_seq snd_seq_device snd_pcm joydev pcspkr snd_timer snd soundcore gameport vmw_vmci i2c_piix4 zram ip_tables crct10dif_pclmul crc32_pclmul vmwgfx crc32c_intel drm_kms_helper ghash_clmulni_intel mptspi e1000 cec ttm scsi_transport_spi serio_raw drm mptscsih mptbase ata_generic pata_acpi fuse Feb 21 13:18:58 fedora kernel: CR2: 0000000000000030 Feb 21 13:18:58 fedora kernel: ---[ end trace 87ac94f887eabb67 ]--- Feb 21 13:18:58 fedora kernel: RIP: 0010:btrfs_device_init_dev_stats+0x26/0x210 Feb 21 13:18:58 fedora kernel: Code: 0f 1f 40 00 0f 1f 44 00 00 41 57 49 89 f7 41 56 41 55 45 31 ed 41 54 55 53 48 83 ec 40 48 8b 47 38 48 c7 44 24 2f 00 00 00 00 <48> 8b 70 30 c6 44 24 3f 00 48 c7 44 24 37 00 00 00 00 48 85 f6 74 Feb 21 13:18:58 fedora kernel: RSP: 0018:ffffb477c3ce3b68 EFLAGS: 00010282 Feb 21 13:18:58 fedora kernel: RAX: 0000000000000000 RBX: ffff8ad3094ea098 RCX: 0000000000000070 Feb 21 13:18:58 fedora kernel: RDX: ffff8ad31b728000 RSI: ffff8ad328c8c2a0 RDI: ffff8ad3094eac00 Feb 21 13:18:58 fedora kernel: RBP: ffff8ad328c8c2a0 R08: 0000000000000070 R09: 0000000000000000 Feb 21 13:18:58 fedora kernel: R10: ffff8ad328c8c2a0 R11: 0000000000000000 R12: ffff8ad3094ea000 Feb 21 13:18:58 fedora kernel: R13: 0000000000000000 R14: ffff8ad3094eac00 R15: ffff8ad328c8c2a0 Feb 21 13:18:58 fedora kernel: FS: 00007fda4979fc40(0000) GS:ffff8ad37be40000(0000) knlGS:0000000000000000 Feb 21 13:18:58 fedora kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 21 13:18:58 fedora kernel: CR2: 0000000000000030 CR3: 000000005faca001 CR4: 00000000003706e0 Feb 21 13:18:59 fedora abrt-dump-journal-oops[700]: abrt-dump-journal-oops: Found oopses: 1 Feb 21 13:18:59 fedora abrt-dump-journal-oops[700]: abrt-dump-journal-oops: Creating problem directories Feb 21 13:19:00 fedora abrt-notification[1631]: System encountered a non-fatal error in btrfs_init_dev_stats() Feb 21 13:19:00 fedora abrt-dump-journal-oops[700]: Reported 1 kernel oopses to Abrt