>>> Zygo Blaxell <[email protected]> schrieb am 18.12.2020 um 02:51 in Nachricht <[email protected]>: > On Thu, Dec 17, 2020 at 02:48:00PM +0100, Ulrich Windl wrote: >> >>> Zygo Blaxell <[email protected]> schrieb am 15.12.2020 um >> 19:18 in >> Nachricht <[email protected]>: >> > On Fri, Dec 11, 2020 at 03:25:47PM +0100, Ulrich Windl wrote: >> >> Hi! >> >> >> >> While configuring a VM environment in a cluster I had setup an SLES15 SP2 >> > test VM using BtrFS. Due to some problem with libvirt (or the VirtualDomain >> >> > RA) the VM was active on more than one cluster node at a time, corrupting >> the >> > filesystem beyond repair it seems: >> >> hvc0:rescue:~ # btrfs check /dev/xvda2 >> >> Opening filesystem to check... >> >> Checking filesystem on /dev/xvda2 >> >> UUID: 1b651baa‑327b‑45fe‑9512‑e7147b24eb49 >> >> [1/7] checking root items >> >> ERROR: child eb corrupted: parent bytenr=1107230720 item=75 parent level=1 >> >> > child level=1 >> >> ERROR: failed to repair root items: Input/output error >> >> hvc0:rescue:~ # btrfsck ‑b /dev/xvda2 >> >> Opening filesystem to check... >> >> Checking filesystem on /dev/xvda2 >> >> UUID: 1b651baa‑327b‑45fe‑9512‑e7147b24eb49 >> >> [1/7] checking root items >> >> ERROR: child eb corrupted: parent bytenr=1106952192 item=75 parent level=1 >> >> > child level=1 >> >> ERROR: failed to repair root items: Input/output error >> >> hvc0:rescue:~ # btrfsck ‑‑repair /dev/xvda2 >> >> enabling repair mode >> >> Opening filesystem to check... >> >> Checking filesystem on /dev/xvda2 >> >> UUID: 1b651baa‑327b‑45fe‑9512‑e7147b24eb49 >> >> [1/7] checking root items >> >> ERROR: child eb corrupted: parent bytenr=1107230720 item=75 parent level=1 >> >> > child level=1 >> >> ERROR: failed to repair root items: Input/output error >> >> >> >> Two questions arising: >> >> 1) Can't the kernel set some "open flag" early when opening the >> >> filesystem, and refuse to open it again (the other VM) when the flag >> >> is set? That could avoid such situations I guess >> > >> > If btrfs wrote "the filesystem is open" to the disk, the filesystem >> > would not be mountable after a crash. >> > >> > The kernel does set an "open flag" (it detects that it is about to mount >> > the same btrfs by uuid, and does something like a bind mount instead) >> > but that applies only to multiple btrfs mounts on the _same_ kernel. >> > In your case there are multiple kernels present (one in each node) >> > and there's no way for them to communicate with each other. >> > >> > There are at least 3 different ways libvirt or other hosting >> > infrastructure software on the VM host could have avoided passing the >> > same physical device to multiple VM guests. I would suggest implementing >> > some or all of them. >> > >> >> 2) Can't btrfs check try somewhat harder to rescue anything, or is >> >> the fs structure in a way that everything is lost? >> > >> >> What really puzzles me is this: >> >> There are several snapshots and subvolumes on the BtFS device. It's >> >> hard to believe that absolutely nothing seems to be recoverable. >> > >> > The most likely outcome is that the root tree nodes and most of the >> > interior nodes of all the filesystem trees are broken. The kernel >> > relies on the trees to work‑‑everything in btrfs except the superblocks >> > can be at any location on disk‑‑so the filesystem will be unreadable by >> > the kernel. Only recovery tools would be able to read the filesystem now. >> > >> > Recovery requires a brute force search of the disk to find as many >> > surviving leaf nodes as possible and rebuild the filesystem trees. >> > This is more or less what 'btrfs check ‑‑repair ‑‑init‑extent‑tree' does. >> >> Hi! >> >> As I didn't have a backup (it was just a test VM to test HA cluster >> configuration), I tried your command: >> It finished rather quickly even with little RAM, but found *many* problems: >> ... >> Deleting bad dir index [715,96,8] root 257 >> Deleting bad dir index [257,96,14] root 257 >> Deleting bad dir index [257,96,15] root 257 >> Deleting bad dir index [259,96,21] root 257 >> Deleting bad dir index [291,96,6] root 257 >> Deleting bad dir index [1804,96,2] root 257 >> Deleting bad dir index [1804,96,3] root 257 >> Deleting bad dir index [1804,96,4] root 257 >> Deleting bad dir index [1804,96,5] root 257 >> Deleting bad dir index [320,96,5] root 257 >> Deleting bad dir index [1805,96,2] root 257 >> Deleting bad dir index [257,96,16] root 257 >> Deleting bad dir index [326,96,6] root 257 >> ERROR: errors found in fs roots >> found 30851072 bytes used, error(s) found >> total csum bytes: 1370452 >> total tree bytes: 3211264 >> total fs tree bytes: 1458176 >> total extent tree bytes: 16384 >> btree space waste bytes: 597304 >> file data blocks allocated: 27607040 >> referenced 27607040 >> >> A subsequent " btrfs check /dev/xvda2" found many problems again: >> ... >> root 257 inode 7589 errors 2001, no inode item, link count wrong >> unresolved ref dir 1804 index 0 namelen 7 name main.cf filetype 1 >> errors 6, no dir index, no inode ref >> root 257 inode 7590 errors 2001, no inode item, link count wrong >> unresolved ref dir 320 index 0 namelen 18 name postfix.configured >> filetype 1 errors 6, no dir index, no inode ref >> root 257 inode 7591 errors 2001, no inode item, link count wrong >> unresolved ref dir 1806 index 0 namelen 3 name pid filetype 2 errors >> 6, no dir index, no inode ref >> root 257 inode 7593 errors 2001, no inode item, link count wrong >> unresolved ref dir 1805 index 0 namelen 11 name master.lock filetype > 1 >> errors 6, no dir index, no inode ref >> root 257 inode 7641 errors 2001, no inode item, link count wrong >> unresolved ref dir 257 index 0 namelen 11 name snapper.log filetype
> 1 >> errors 6, no dir index, no inode ref >> root 257 inode 7644 errors 2001, no inode item, link count wrong >> unresolved ref dir 326 index 0 namelen 16 name logrotate.status >> filetype 1 errors 6, no dir index, no inode ref >> ERROR: errors found in fs roots >> found 30965760 bytes used, error(s) found >> total csum bytes: 1370452 >> total tree bytes: 3342336 >> total fs tree bytes: 1523712 >> total extent tree bytes: 81920 >> btree space waste bytes: 669123 >> file data blocks allocated: 27607040 >> referenced 27607040 >> >> Even after iterating a "normal" check a few times, I could not mount the >> "repaired" filesystem: >> hvc0:rescue:~ # mount -r /dev/xvda2 /mnt >> mount.bin: /mnt: wrong fs type, bad option, bad superblock on /dev/xvda2, >> missing codepage or helper program, or other error. >> hvc0:rescue:~ # journalctl -f >> -- Logs begin at Thu 2020-12-17 13:36:57 UTC. -- >> Dec 17 13:44:33 rescue kernel: BTRFS info (device xvda2): disk space caching >> is enabled >> Dec 17 13:44:33 rescue kernel: BTRFS info (device xvda2): has skinny extents >> Dec 17 13:44:33 rescue kernel: BTRFS error (device xvda2): chunk 1048576 has >> missing dev extent, have 0 expect 1 >> Dec 17 13:44:33 rescue kernel: BTRFS error (device xvda2): failed to verify >> dev extents against chunks: -117 >> Dec 17 13:44:33 rescue kernel: BTRFS error (device xvda2): open_ctree failed >> ^C >> >> I'm not hoping to recover the system to a usable state, but out of curiosity >> I'd like to get an impression what had survived and what had not. > > If you're missing dev extents you'll need to run chunk-recover to > brute-force scan for the chunk headers. But this is really stretching > the abilities of the current tools. Hi! (Back at the time when I had developed a copy program for floppy disks, I had a set of defective floppies for testing, so you chan see this disaster as a challenge for the tools) I tried: hvc0:rescue:~ # btrfs rescue chunk-recover /dev/xvda2 Scanning: DONE in dev0 Check chunks successfully with no orphans Chunk tree recovered successfully I don't really understand what I'm doing, but as there were still too many errors (and mount was refused), I re-tried "btrfs check --repair --init-extent-tree", resulting in a core dump: ... Repaired extent references for 1754910720 ref mismatch on [1766580224 4096] extent item 0, found 1 data backref 1766580224 root 257 owner 294 offset 90112 num_refs 0 not found in extent tree incorrect local backref count on 1766580224 root 257 owner 294 offset 90112 found 1 wanted 0 back 0x56103db41180 backpointer mismatch on [1766580224 4096] adding new data backref on 1766580224 root 257 owner 294 offset 90112 found 1 Repaired extent references for 1766580224 btrfs unable to find ref byte nr 5586944 parent 0 root 2 owner 0 offset 0 transaction.c:195: btrfs_commit_transaction: BUG_ON `ret` triggered, value -5 btrfs(+0x51829)[0x56103c70f829] btrfs(btrfs_commit_transaction+0x1ae)[0x56103c70fe1e] btrfs(+0x1e73c)[0x56103c6dc73c] btrfs(cmd_check+0x1124)[0x56103c7253d4] btrfs(main+0x8e)[0x56103c6dcd2e] /lib64/libc.so.6(__libc_start_main+0xea)[0x7f0caf2b934a] btrfs(_start+0x2a)[0x56103c6dcf2a] Aborted (core dumped) hvc0:rescue:~ # btrfs version btrfs-progs v4.19.1 Regards, Ulrich > >> Regards, >> Ulrich >> >> > >> > If you run ‑‑init‑extent‑tree, assuming it works (you should not assume >> > that it will work), you would then have to audit the filesystem contents >> > to see what data was not recovered. At a minimum, you would lose a few >> > hundred filesystem items, since each metadata leaf node contains around >> > 200 items and you definitely will not recover them all. The data csum >> > trees might not be in sync with the rest of the filesytem, so you can't >> > rely on scrub to check data integrity. If this is successful, you will >> > have a similar result to mounting ext4 on multiple VMs simultaneously‑‑ >> > fsck runs, the filesystem is read‑write again, but you don't get all >> > the data back, nor even a list of data that was lost or corrupted. >> > >> > ‑‑init‑extent‑tree can be quite slow, especially if you don't have enough >> > RAM to hold all the filesystem's metadata. It's still under development, >> > so one possible outcome is that it crashes with an assertion failure >> > and leaves you with a even more broken filesystem. >> > >> > It's usually faster and easier to mkfs and restore from backups instead. >> > >> >> I have this: >> >> hvc0:rescue:~ # btrfs inspect‑internal dump‑super /dev/xvda2 >> >> superblock: bytenr=65536, device=/dev/xvda2 >> >> ‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑ >> >> csum_type 0 (crc32c) >> >> csum_size 4 >> >> csum 0x659898f3 [match] >> >> bytenr 65536 >> >> flags 0x1 >> >> ( WRITTEN ) >> >> magic _BHRfS_M [match] >> >> fsid 1b651baa‑327b‑45fe‑9512‑e7147b24eb49 >> >> metadata_uuid 1b651baa‑327b‑45fe‑9512‑e7147b24eb49 >> >> label >> >> generation 280 >> >> root 1107214336 >> >> sys_array_size 97 >> >> chunk_root_generation 35 >> >> root_level 0 >> >> chunk_root 1048576 >> >> chunk_root_level 0 >> >> log_root 0 >> >> log_root_transid 0 >> >> log_root_level 0 >> >> total_bytes 10727960576 >> >> bytes_used 1461825536 >> >> sectorsize 4096 >> >> nodesize 16384 >> >> leafsize (deprecated) 16384 >> >> stripesize 4096 >> >> root_dir 6 >> >> num_devices 1 >> >> compat_flags 0x0 >> >> compat_ro_flags 0x0 >> >> incompat_flags 0x163 >> >> ( MIXED_BACKREF | >> >> DEFAULT_SUBVOL | >> >> BIG_METADATA | >> >> EXTENDED_IREF | >> >> SKINNY_METADATA ) >> >> cache_generation 280 >> >> uuid_tree_generation 40 >> >> dev_item.uuid 2abdf93e‑2f2d‑4eef‑a1d8‑9325f809ebce >> >> dev_item.fsid 1b651baa‑327b‑45fe‑9512‑e7147b24eb49 [match] >> >> dev_item.type 0 >> >> dev_item.total_bytes 10727960576 >> >> dev_item.bytes_used 2436890624 >> >> dev_item.io_align 4096 >> >> dev_item.io_width 4096 >> >> dev_item.sector_size 4096 >> >> dev_item.devid 1 >> >> dev_item.dev_group 0 >> >> dev_item.seek_speed 0 >> >> dev_item.bandwidth 0 >> >> dev_item.generation 0 >> >> >> >> Regards, >> >> Ulrich Windl >> >> >> >> >> >> >> >>
