Thanks Qu and David for your prompt attention!
Qu Wenruo <[email protected]> writes: >> following tree-dumps: >> >> sudo btrfs inspect dump-tree -t root /dev/mapper/vg1-root > >> /tmp/btrfsdumproot >> sudo btrfs inspect dump-tree -b 1350630375424 /dev/mapper/vg1-root > >> /tmp/btrfsdump1350630375424 >> >> The root dump is at https://termbin.com/lz0l and the block dump at >> https://termbin.com/oev5 . The number 1350630375424 does not occur in >> the root dump. The root dump has 16715 lines, the block dump only 645. > > Super nice move, it shows the corruption and the cause. > > item 66 key (1714119835648 METADATA_ITEM 0) itemoff 13325 itemsize 33 > item 67 key (10510212874240 METADATA_ITEM 0) itemoff 13283 itemsize 42 > item 68 key (1714119868416 METADATA_ITEM 0) itemoff 13250 itemsize 33 > > See the key objectid of key 67 is way larger than item 66/68. > > And furthermore, it indeed looks like a bit rot: > 0x18f19810000 (1714119835648) > 0x98f19814000 (10510212874240) > 0x18f19818000 (1714119868416) > > See one bit got flipped. Thanks for the explanation! > I don't know it's corrupted in memory or on the SSD, although I tend to > believe it's caused by memory bit flip. > But anyway, it can be fixed by patching the corrupted leaf manually. > > I'm working on the fix. > Please make sure there is no write into the fs (just in case, since the > fs should be RO). > > And prepare a LiveUSB on which you could compile btrfs-progs (needs some > dependency). > > It shouldn't take me too long time crafting the fix. Thanks Qu! I see that ArchLinux LiveUSB is based on linux 4.20.0, but 4.20.1 contains some btrfs fixes. Should I make sure to be at least on 4.20.1 for this? David Sterba <[email protected]> writes: > On Tue, Jan 15, 2019 at 07:48:47PM +0800, Qu Wenruo wrote: >> See the key objectid of key 67 is way larger than item 66/68. >> >> And furthermore, it indeed looks like a bit rot: >> 0x18f19810000 (1714119835648) >> 0x98f19814000 (10510212874240) >> 0x18f19818000 (1714119868416) >> >> See one bit got flipped. >> I don't know it's corrupted in memory or on the SSD, although I tend to >> believe it's caused by memory bit flip. > > Single bit flips are almost always caused by RAM, not storage (that > fails in larger blocks or does not even return any data) >> But anyway, it can be fixed by patching the corrupted leaf manually. > > That will fix one instance of the corrupted key, without an analysis how > far the wrong key got spred it's still risky. How could I analyse this?
