On Mon, Jan 22, 2018 at 10:06:58PM +0100, Claes Fransson wrote:
> Hi!
> 
> I really like the features of BTRFS, especially deduplication,
> snapshotting and checksumming. However, when using it on my laptop the
> last couple of years, it has became corrupted a lot of times.
> Sometimes I have managed to fix the problems (at least so much that I
> can continue to use the filesystem) with check --repair, but several
> times I had to recreate the file system and reinstall the operating
> system.
> 
> I am guessing the corruptions might be the results of unclean
> shutdowns, mostly after system hangs, but also because of running out
> of battery sometimes?
> Furthermore, the power-led has recently started blinking (also when
> the power-cable is plugged in), I guess because of an old and bad
> battery. Maybe the current corruption also can have something to do
> with this? However I almost always run with power cable plugged in in
> last year, only on battery a few seconds a few times when moving the
> laptop.
> 
> Currently, I can only mount the filesystem readonly, it goes readonly
> automatically if I try to mount it normally.
> 
> When booting an OpenSUSE Tumbleweed-20180119 live-iso:
> localhost:~ # uname -r
> 4.14.13-1-default
> localhost:~ # btrfs --version
> btrfs-progs v4.14.1
> 
> localhost:~ # btrfs check -p /dev/sda12
> Checking filesystem on /dev/sda12

[fixing up bad paste]

> UUID: d2819d5a-fd69-484b-bf34-f2b5692cbe1f
> bad key ordering 159 160 bad block 690436964352
> ERROR: errors found in extent allocation tree or chunk allocation
> checking free space cache [.]
> checking fs roots [o]
> checking csums
> bad key ordering 159 160
> Error looking up extent record -1

[snip]

> localhost:~ # btrfs inspect-internal dump-tree -b 690436964352
> /dev/sda12
> btrfs-progs v4.14.1
>     leaf 690436964352 items 170 free space 1811 generation 196864 owner 2
>     leaf 690436964352 flags 0x1(WRITTEN) backref revision 1
>     fs uuid d2819d5a-fd69-484b-bf34-f2b5692cbe1f
>     chunk uuid 52f81fe6-893b-4432-9336-895057ee81e1
> .
> .
> .
>         item 157 key (22732500992 EXTENT_ITEM 16384) itemoff 6538 itemsize 53
>                 refs 1 gen 821 flags DATA
>                 extent data backref root 287 objectid 51665 offset 0 count 1
>         item 158 key (22732517376 EXTENT_ITEM 16384) itemoff 6485 itemsize 53
>                 refs 1 gen 821 flags DATA
>                 extent data backref root 287 objectid 51666 offset 0 count 1
>         item 159 key (22732533760 EXTENT_ITEM 16384) itemoff 6485 itemsize 0
> print-tree.c:428: print_extent_item: BUG_ON `item_size != sizeof(*ei0)` 
> triggered, value 1
> btrfs(+0x365c6)[0x55bdfaada5c6]
> btrfs(print_extent_item+0x424)[0x55bdfaadb284]
> btrfs(btrfs_print_leaf+0x94e)[0x55bdfaadbc1e]
> btrfs(btrfs_print_tree+0x295)[0x55bdfaadcf05]
> btrfs(cmd_inspect_dump_tree+0x734)[0x55bdfab1b024]
> btrfs(main+0x7d)[0x55bdfaac7d4d]
> /lib64/libc.so.6(__libc_start_main+0xea)[0x7ff42100ff4a]
> btrfs(_start+0x2a)[0x55bdfaac7e5a]
> Aborted (core dumped)

   Wow, I've never seen it do that before. It's the next thing I'd
have asked for, so it's good you've preempted it.

   The main thing is that bad key ordering is almost always due to RAM
corruption. That's either bad RAM, or dodgy power regulation -- the
latter could be the PSU, or capacitors on the motherboard. (In this
case, it might also be something funny with the battery).

   I would definitely recommend a long run of memtest86. At least 8
hours, preferably 24. If you get errors repeatedly in the sme place,
it's the RAM. If they appear randomly, it's probably the power
regulation.

[snip]

> 
> The filesystem had become pretty full, I had planned to increase the
> Btrfs-partition size before it became corrupt.
> 
> Active kernel when the filesystem went read only: OpenSUSE Linux
> 4.14.14-1.geef6178-default, from the
> http://download.opensuse.org/repositories/Kernel:/stable/standard/stable
> repository.
> 
> Fstab mount options: noatime,autodefrag (I have been using the option
> nossd with older kernels one period in the past on the filesystem).
> 
> If it matters, I have been running duperemove many times on the
> filesystem since creation.
> 
> To test the RAM, I have been running mprime Blend-test for 24 hours
> after the corruption without any error or warning.

   Of all of the bad key order errors I've seen (dozens), I think
there were a whole two which turned out not to be obviously related to
corrupt RAM. I still say that it's most likely the hardware.

> Is there a way I can try to repair this filesystem without the need to
> recreate it and reinstall the operating system? A reinstall including
> all currently installed packages, and restoring all current system
> settings, would probably take some time for me to do.
> If it is currently not repairable, it would be nice if this kind of
> corruption could be repaired in the future, even if losing a few
> files. Or if the corruptions could be avoided in the first place.

   Given that the current tools crash, the answer's a definite
no. However, if you can get a developer interested, they may be able
to write a fix for it, given an image of the FS (using btrfs-image).

[snip]
> I have never noticed any corruptions on the NTFS and Ext4 file systems
> on the laptop, only on the Btrfs file systems.

   You've never _noticed_ them. :)

   Hugo.

-- 
Hugo Mills             | ... one ping(1) to rule them all, and in the
hugo@... carfax.org.uk | darkness bind(2) them.
http://carfax.org.uk/  |
PGP: E2AB1DE4          |                                                Illiad

Attachment: signature.asc
Description: Digital signature

Reply via email to