Re: Unrecoverable btrfs corruption (backref bytes do not match extent backref)

Chris Murphy Thu, 03 Jan 2019 17:15:45 -0800

On Thu, Jan 3, 2019 at 3:13 PM Nazar Mokrynskyi <[email protected]> wrote:


> root@ubuntu:~# btrfsck /dev/mapper/luks-739967f1-9770-470a-a031-8d8b8bcdb350
> warning, bad space info total_bytes 2155872256 used 2155876352
> warning, bad space info total_bytes 3229614080 used 3229618176
> warning, bad space info total_bytes 4303355904 used 4303360000
> warning, bad space info total_bytes 5377097728 used 5377101824
> warning, bad space info total_bytes 6450839552 used 6450843648
> warning, bad space info total_bytes 7524581376 used 7524585472
> warning, bad space info total_bytes 8598323200 used 8598327296
> warning, bad space info total_bytes 9672065024 used 9672069120
> warning, bad space info total_bytes 10745806848 used 10745810944
> warning, bad space info total_bytes 11819548672 used 11819552768
> warning, bad space info total_bytes 12893290496 used 12893294592
> warning, bad space info total_bytes 13967032320 used 13967036416
> warning, bad space info total_bytes 15040774144 used 15040778240
> warning, bad space info total_bytes 16114515968 used 16114520064
> warning, bad space info total_bytes 17188257792 used 17188261888
> warning, bad space info total_bytes 18261999616 used 18262003712
> warning, bad space info total_bytes 19335741440 used 19335745536
> warning, bad space info total_bytes 20409483264 used 20409487360
> warning, bad space info total_bytes 21483225088 used 21483229184
> warning, bad space info total_bytes 22556966912 used 22556971008
> warning, bad space info total_bytes 23630708736 used 23630712832
> warning, bad space info total_bytes 24704450560 used 24704454656
> warning, bad space info total_bytes 25778192384 used 25778196480
> warning, bad space info total_bytes 26851934208 used 26851938304
> warning, bad space info total_bytes 27925676032 used 27925680128
> warning, bad space info total_bytes 28999417856 used 28999421952
> warning, bad space info total_bytes 30073159680 used 30073163776
> warning, bad space info total_bytes 31146901504 used 31146905600
> warning, bad space info total_bytes 32220643328 used 32220647424
> Checking filesystem on /dev/mapper/luks-739967f1-9770-470a-a031-8d8b8bcdb350
> UUID: 5170aca4-061a-4c6c-ab00-bd7fc8ae6030
> checking extents
> extent item 3114475520 has multiple extent items
> ref mismatch on [3114475520 4096] extent item 1, found 2
> backref bytes do not match extent backref, bytenr=3114475520, ref bytes=4096, 
> backref bytes=36864
> backpointer mismatch on [3114475520 4096]
> ERROR: errors found in extent allocation tree or chunk allocation
> checking free space cache
> checking fs roots
> checking csums
> checking root refs
> found 39409483813 bytes used, error(s) found
> total csum bytes: 35990412
> total tree bytes: 2395095040
> total fs tree bytes: 2249408512
> total extent tree bytes: 96534528
> btree space waste bytes: 456622616
> file data blocks allocated: 174319587328
>  referenced 61677670400

I haven't seen "bad space info" before and searching the list archive
I only came up with one other report of it and no developer replied.

What do you get with 'btrfs check --mode=lowmem' ? This is a different
implementation of check, and might reveal some additional information
useful to developers. It is wickedly slow however.


> If this seems anything important and you want me to run some commands to 
> check what happened exactly, I can start VMs with this partition image 
> connected and do whatever is needed. I can't send image anywhere though, 
> since it contains sensitive information.

If you use btrfs-image -ss option, there won't be any sensitive
information included. Files are hashed. Some short name files or dirs
can't be hashed (you'll see a warning) and those are replaced with
random garbage instead. There is only metadata with the image, no user
data.

> P.S. I really wish BTRFS can stop accidentally corrupting itself one day.

I have several Btrfs file systems that I use constantly, all but one
is at least three years old. I've never experienced corruption on any
of them. I did discover some isolated corruption of file data
(systemd-journald and also a VM image) but those were bugs that got
fixed, and also the file system metadata was fine.

So I guess my bigger complaint with Btrfs isn't so much that
corruption happens, but that it sometimes happens in a way we can't
track down and know what to blame for the problem. Knowing what to
blame is important for fixing bugs and also working around them in the
meantime. So for sure it's frustrating, even if I haven't experienced
it myself.

As Btrfs file systems get bigger, the more we're running into the very
scalability complaints that existed pre-Btrfs. The idea of Btrfs is
that it should always be consistent, and not need offline repair.
Doing an offline consistency check is just too expensive. Doing a
check and repair that ends up making the situation worse, is really
expensive. That's a bad user experience that reasonably turns into a
lost user. And in order to fix bugs and make Btrfs better, we need
more and good bug reports, not less.

So... yeah. I'm not sure if there is more tracing or debugging
information that's needed by default in Btrfs, so we can have a better
chance of understanding these corruptions when they occur? Or what. We
can't expect people to leave integrity checking always on, it's too
expensive.

--
Chris Murphy

Re: Unrecoverable btrfs corruption (backref bytes do not match extent backref)

Reply via email to