Thanks Qu and David for your prompt attention!

Qu Wenruo <[email protected]> writes:
>> following tree-dumps:
>> 
>>   sudo btrfs inspect dump-tree -t root /dev/mapper/vg1-root > 
>> /tmp/btrfsdumproot
>>   sudo btrfs inspect dump-tree -b 1350630375424 /dev/mapper/vg1-root > 
>> /tmp/btrfsdump1350630375424
>> 
>> The root dump is at https://termbin.com/lz0l and the block dump at
>> https://termbin.com/oev5 . The number 1350630375424 does not occur in
>> the root dump. The root dump has 16715 lines, the block dump only 645.
>
> Super nice move, it shows the corruption and the cause.
>
>       item 66 key (1714119835648 METADATA_ITEM 0) itemoff 13325 itemsize 33
>       item 67 key (10510212874240 METADATA_ITEM 0) itemoff 13283 itemsize 42
>       item 68 key (1714119868416 METADATA_ITEM 0) itemoff 13250 itemsize 33
>
> See the key objectid of key 67 is way larger than item 66/68.
>
> And furthermore, it indeed looks like a bit rot:
> 0x18f19810000 (1714119835648)
> 0x98f19814000 (10510212874240)
> 0x18f19818000 (1714119868416)
>
> See one bit got flipped.

Thanks for the explanation!

> I don't know it's corrupted in memory or on the SSD, although I tend to
> believe it's caused by memory bit flip.
> But anyway, it can be fixed by patching the corrupted leaf manually.
>
> I'm working on the fix.
> Please make sure there is no write into the fs (just in case, since the
> fs should be RO).
>
> And prepare a LiveUSB on which you could compile btrfs-progs (needs some
> dependency).
>
> It shouldn't take me too long time crafting the fix.

Thanks Qu! I see that ArchLinux LiveUSB is based on linux 4.20.0, but
4.20.1 contains some btrfs fixes. Should I make sure to be at least on
4.20.1 for this?

David Sterba <[email protected]> writes:
> On Tue, Jan 15, 2019 at 07:48:47PM +0800, Qu Wenruo wrote:
>> See the key objectid of key 67 is way larger than item 66/68.
>> 
>> And furthermore, it indeed looks like a bit rot:
>> 0x18f19810000 (1714119835648)
>> 0x98f19814000 (10510212874240)
>> 0x18f19818000 (1714119868416)
>> 
>> See one bit got flipped.

>> I don't know it's corrupted in memory or on the SSD, although I tend to
>> believe it's caused by memory bit flip.
>
> Single bit flips are almost always caused by RAM, not storage (that
> fails in larger blocks or does not even return any data)
>> But anyway, it can be fixed by patching the corrupted leaf manually.
>
> That will fix one instance of the corrupted key, without an analysis how
> far the wrong key got spred it's still risky.

How could I analyse this?

Reply via email to