2018-01-23 14:06 GMT+01:00 Claes Fransson <claes.v.frans...@gmail.com>: > 2018-01-22 22:22 GMT+01:00 Hugo Mills <h...@carfax.org.uk>: >> On Mon, Jan 22, 2018 at 10:06:58PM +0100, Claes Fransson wrote: >>> Hi! >>> >>> I really like the features of BTRFS, especially deduplication, >>> snapshotting and checksumming. However, when using it on my laptop the >>> last couple of years, it has became corrupted a lot of times. >>> Sometimes I have managed to fix the problems (at least so much that I >>> can continue to use the filesystem) with check --repair, but several >>> times I had to recreate the file system and reinstall the operating >>> system. >>> >>> I am guessing the corruptions might be the results of unclean >>> shutdowns, mostly after system hangs, but also because of running out >>> of battery sometimes? >>> Furthermore, the power-led has recently started blinking (also when >>> the power-cable is plugged in), I guess because of an old and bad >>> battery. Maybe the current corruption also can have something to do >>> with this? However I almost always run with power cable plugged in in >>> last year, only on battery a few seconds a few times when moving the >>> laptop. >>> >>> Currently, I can only mount the filesystem readonly, it goes readonly >>> automatically if I try to mount it normally. >>> >>> When booting an OpenSUSE Tumbleweed-20180119 live-iso: >>> localhost:~ # uname -r >>> 4.14.13-1-default >>> localhost:~ # btrfs --version >>> btrfs-progs v4.14.1 >>> >>> localhost:~ # btrfs check -p /dev/sda12 >>> Checking filesystem on /dev/sda12 >> >> [fixing up bad paste] >> >>> UUID: d2819d5a-fd69-484b-bf34-f2b5692cbe1f >>> bad key ordering 159 160 bad block 690436964352 >>> ERROR: errors found in extent allocation tree or chunk allocation >>> checking free space cache [.] >>> checking fs roots [o] >>> checking csums >>> bad key ordering 159 160 >>> Error looking up extent record -1 >> >> [snip] >> >>> localhost:~ # btrfs inspect-internal dump-tree -b 690436964352 >>> /dev/sda12 >>> btrfs-progs v4.14.1 >>> leaf 690436964352 items 170 free space 1811 generation 196864 owner 2 >>> leaf 690436964352 flags 0x1(WRITTEN) backref revision 1 >>> fs uuid d2819d5a-fd69-484b-bf34-f2b5692cbe1f >>> chunk uuid 52f81fe6-893b-4432-9336-895057ee81e1 >>> . >>> . >>> . >>> item 157 key (22732500992 EXTENT_ITEM 16384) itemoff 6538 itemsize >>> 53 >>> refs 1 gen 821 flags DATA >>> extent data backref root 287 objectid 51665 offset 0 count 1 >>> item 158 key (22732517376 EXTENT_ITEM 16384) itemoff 6485 itemsize >>> 53 >>> refs 1 gen 821 flags DATA >>> extent data backref root 287 objectid 51666 offset 0 count 1 >>> item 159 key (22732533760 EXTENT_ITEM 16384) itemoff 6485 itemsize 0 >>> print-tree.c:428: print_extent_item: BUG_ON `item_size != sizeof(*ei0)` >>> triggered, value 1 >>> btrfs(+0x365c6)[0x55bdfaada5c6] >>> btrfs(print_extent_item+0x424)[0x55bdfaadb284] >>> btrfs(btrfs_print_leaf+0x94e)[0x55bdfaadbc1e] >>> btrfs(btrfs_print_tree+0x295)[0x55bdfaadcf05] >>> btrfs(cmd_inspect_dump_tree+0x734)[0x55bdfab1b024] >>> btrfs(main+0x7d)[0x55bdfaac7d4d] >>> /lib64/libc.so.6(__libc_start_main+0xea)[0x7ff42100ff4a] >>> btrfs(_start+0x2a)[0x55bdfaac7e5a] >>> Aborted (core dumped) >> >> Wow, I've never seen it do that before. It's the next thing I'd >> have asked for, so it's good you've preempted it. >> >> The main thing is that bad key ordering is almost always due to RAM >> corruption. That's either bad RAM, or dodgy power regulation -- the >> latter could be the PSU, or capacitors on the motherboard. (In this >> case, it might also be something funny with the battery). >> >> I would definitely recommend a long run of memtest86. At least 8 >> hours, preferably 24. If you get errors repeatedly in the sme place, >> it's the RAM. If they appear randomly, it's probably the power >> regulation. >> > Thanks for the suggestion, I will try to do this in the next days. >
I haven't noticed before that there is actually RAM-modules from different vendors in the laptop. One 8GB by Samsung, and one 4GB by Kingston! Maybe that is a source for the corruptions. I also found that there indeed was a new firmware version for my SSD-disk, so I have now updated it's firmware to the newest version. Unfortunately I couldn't find any information of what possible issues it was supposed to fix. The laptop has already the latest BIOS version provided by ASUS for the model. I have not yet run the memtest86. Claes >> [snip] >> >>> >>> The filesystem had become pretty full, I had planned to increase the >>> Btrfs-partition size before it became corrupt. >>> >>> Active kernel when the filesystem went read only: OpenSUSE Linux >>> 4.14.14-1.geef6178-default, from the >>> http://download.opensuse.org/repositories/Kernel:/stable/standard/stable >>> repository. >>> >>> Fstab mount options: noatime,autodefrag (I have been using the option >>> nossd with older kernels one period in the past on the filesystem). >>> >>> If it matters, I have been running duperemove many times on the >>> filesystem since creation. >>> >>> To test the RAM, I have been running mprime Blend-test for 24 hours >>> after the corruption without any error or warning. >> >> Of all of the bad key order errors I've seen (dozens), I think >> there were a whole two which turned out not to be obviously related to >> corrupt RAM. I still say that it's most likely the hardware. > > Okay, thank you for sharing your experience with me. > >> >>> Is there a way I can try to repair this filesystem without the need to >>> recreate it and reinstall the operating system? A reinstall including >>> all currently installed packages, and restoring all current system >>> settings, would probably take some time for me to do. >>> If it is currently not repairable, it would be nice if this kind of >>> corruption could be repaired in the future, even if losing a few >>> files. Or if the corruptions could be avoided in the first place. >> >> Given that the current tools crash, the answer's a definite >> no. However, if you can get a developer interested, they may be able >> to write a fix for it, given an image of the FS (using btrfs-image). >> > Okay, will try to produce and upload an image within the next week. > > >> [snip] >>> I have never noticed any corruptions on the NTFS and Ext4 file systems >>> on the laptop, only on the Btrfs file systems. >> >> You've never _noticed_ them. :) >> >> Hugo. >> >> -- >> Hugo Mills | ... one ping(1) to rule them all, and in the >> hugo@... carfax.org.uk | darkness bind(2) them. >> http://carfax.org.uk/ | >> PGP: E2AB1DE4 | >> Illiad > > Thank you for your answers. > > Claes -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html