Hi! I really like the features of BTRFS, especially deduplication, snapshotting and checksumming. However, when using it on my laptop the last couple of years, it has became corrupted a lot of times. Sometimes I have managed to fix the problems (at least so much that I can continue to use the filesystem) with check --repair, but several times I had to recreate the file system and reinstall the operating system.
I am guessing the corruptions might be the results of unclean shutdowns, mostly after system hangs, but also because of running out of battery sometimes? Furthermore, the power-led has recently started blinking (also when the power-cable is plugged in), I guess because of an old and bad battery. Maybe the current corruption also can have something to do with this? However I almost always run with power cable plugged in in last year, only on battery a few seconds a few times when moving the laptop. Currently, I can only mount the filesystem readonly, it goes readonly automatically if I try to mount it normally. When booting an OpenSUSE Tumbleweed-20180119 live-iso: localhost:~ # uname -r 4.14.13-1-default localhost:~ # btrfs --version btrfs-progs v4.14.1 localhost:~ # btrfs check -p /dev/sda12 Checking filesystem on /dev/sda12 UUID: d2819d5a-fd69-484b-bf34-f2b5692cbe1f bad key ordering 159 160 bad block 690436964352 ERROR: errors found in extent allocation tree or chunk allocation checking free space cache [.] checking fs roots [o] checking csums bad key ordering 159 160 Error looking up extent record -1 Right section didn't have a record There are no extents for csum range 22732550144-24923615232 Csum exists for 16303538176-24923615232 but there is no extent record ERROR: errors found in csum tree found 344063430663 bytes used, error(s) found total csum bytes: 0 total tree bytes: 453410816 total fs tree bytes: 0 total extent tree bytes: 452952064 btree space waste bytes: 140165932 file data blocks allocated: 108462080 referenced 108462080 localhost:~ # btrfs inspect-internal dump-tree -b 690436964352 /dev/sda12 btrfs-progs v4.14.1 leaf 690436964352 items 170 free space 1811 generation 196864 owner 2 leaf 690436964352 flags 0x1(WRITTEN) backref revision 1 fs uuid d2819d5a-fd69-484b-bf34-f2b5692cbe1f chunk uuid 52f81fe6-893b-4432-9336-895057ee81e1 . . . item 157 key (22732500992 EXTENT_ITEM 16384) itemoff 6538 itemsize 53 refs 1 gen 821 flags DATA extent data backref root 287 objectid 51665 offset 0 count 1 item 158 key (22732517376 EXTENT_ITEM 16384) itemoff 6485 itemsize 53 refs 1 gen 821 flags DATA extent data backref root 287 objectid 51666 offset 0 count 1 item 159 key (22732533760 EXTENT_ITEM 16384) itemoff 6485 itemsize 0 print-tree.c:428: print_extent_item: BUG_ON `item_size != sizeof(*ei0)` triggered, value 1 btrfs(+0x365c6)[0x55bdfaada5c6] btrfs(print_extent_item+0x424)[0x55bdfaadb284] btrfs(btrfs_print_leaf+0x94e)[0x55bdfaadbc1e] btrfs(btrfs_print_tree+0x295)[0x55bdfaadcf05] btrfs(cmd_inspect_dump_tree+0x734)[0x55bdfab1b024] btrfs(main+0x7d)[0x55bdfaac7d4d] /lib64/libc.so.6(__libc_start_main+0xea)[0x7ff42100ff4a] btrfs(_start+0x2a)[0x55bdfaac7e5a] Aborted (core dumped) check --repair hangs after reporting "bad key ordering 159 160" with no disk activity but constant high cpu usage. localhost:~ # smartctl -a /dev/sda smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.14.13-1-default] (SUSE RPM) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Device Model: SanDisk SD8SB8U1T001122 Serial Number: 163076421231 LU WWN Device Id: 5 001b44 4a4dde388 Firmware Version: X4140000 User Capacity: 1,024,209,543,168 bytes [1.02 TB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Form Factor: 2.5 inches Device is: Not in smartctl database [for details use: -P showall] ATA Version is: ACS-2 T13/2015-D revision 3 SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Mon Jan 22 15:28:46 2018 EST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 0) seconds. Offline data collection capabilities: (0x11) SMART execute Offline immediate. No Auto Offline data collection support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. No Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 10) minutes. SMART Attributes Data Structure revision number: 4 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0032 100 100 --- Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 --- Old_age Always - 7692 12 Power_Cycle_Count 0x0032 100 100 --- Old_age Always - 496 165 Unknown_Attribute 0x0032 100 100 --- Old_age Always - 1112516724361 166 Unknown_Attribute 0x0032 100 100 --- Old_age Always - 1 167 Unknown_Attribute 0x0032 100 100 --- Old_age Always - 25 168 Unknown_Attribute 0x0032 100 100 --- Old_age Always - 44 169 Unknown_Attribute 0x0032 100 100 --- Old_age Always - 753 170 Unknown_Attribute 0x0032 100 100 --- Old_age Always - 0 171 Unknown_Attribute 0x0032 100 100 --- Old_age Always - 0 172 Unknown_Attribute 0x0032 100 100 --- Old_age Always - 0 173 Unknown_Attribute 0x0032 100 100 --- Old_age Always - 18 174 Unknown_Attribute 0x0032 100 100 --- Old_age Always - 57 184 End-to-End_Error 0x0032 100 100 --- Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 --- Old_age Always - 0 188 Command_Timeout 0x0032 100 100 --- Old_age Always - 1 194 Temperature_Celsius 0x0022 061 062 --- Old_age Always - 39 (Min/Max 9/62) 199 UDMA_CRC_Error_Count 0x0032 100 100 --- Old_age Always - 0 230 Unknown_SSD_Attribute 0x0032 100 100 --- Old_age Always - 4733091251278 232 Available_Reservd_Space 0x0033 100 100 004 Pre-fail Always - 100 233 Media_Wearout_Indicator 0x0032 100 100 --- Old_age Always - 19202 234 Unknown_Attribute 0x0032 100 100 --- Old_age Always - 32167 241 Total_LBAs_Written 0x0030 253 253 --- Old_age Offline - 22520 242 Total_LBAs_Read 0x0030 253 253 --- Old_age Offline - 183882 244 Unknown_Attribute 0x0032 000 100 --- Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 7570 - # 2 Extended offline Completed without error 00% 7395 - # 3 Extended offline Completed without error 00% 6253 - # 4 Short offline Completed without error 00% 4030 - # 5 Extended offline Completed without error 00% 1568 - # 6 Extended offline Completed without error 00% 1434 - Selective Self-tests/Logging not supported localhost:~ # btrfs fi usage /mnt Overall: Device size: 450.00GiB Device allocated: 424.04GiB Device unallocated: 25.96GiB Device missing: 0.00B Used: 420.38GiB Free (estimated): 27.39GiB (min: 27.39GiB) Data ratio: 1.00 Metadata ratio: 1.00 Global reserve: 512.00MiB (used: 0.00B) Data,single: Size:411.98GiB, Used:410.55GiB /dev/sda12 411.98GiB Metadata,single: Size:12.00GiB, Used:9.83GiB /dev/sda12 12.00GiB System,single: Size:64.00MiB, Used:64.00KiB /dev/sda12 64.00MiB Unallocated: /dev/sda12 25.96GiB The filesystem had become pretty full, I had planned to increase the Btrfs-partition size before it became corrupt. Active kernel when the filesystem went read only: OpenSUSE Linux 4.14.14-1.geef6178-default, from the http://download.opensuse.org/repositories/Kernel:/stable/standard/stable repository. Fstab mount options: noatime,autodefrag (I have been using the option nossd with older kernels one period in the past on the filesystem). If it matters, I have been running duperemove many times on the filesystem since creation. To test the RAM, I have been running mprime Blend-test for 24 hours after the corruption without any error or warning. Is there a way I can try to repair this filesystem without the need to recreate it and reinstall the operating system? A reinstall including all currently installed packages, and restoring all current system settings, would probably take some time for me to do. If it is currently not repairable, it would be nice if this kind of corruption could be repaired in the future, even if losing a few files. Or if the corruptions could be avoided in the first place. Laptop: Asus N56JR-S4075H, bought new 2014 Hard drive: since 14 months a SanDisk X400 SD8SB8U1T001122 1TB SSD, originally a Seagate ST750LM000 SSHD RAM: lshw:-memory description: System Memory physical id: c slot: System board or motherboard size: 12GiB *-bank:0 description: SODIMM DDR3 Synchronous 1600 MHz (0,6 ns) product: ASU16D3LS1KBG/4G vendor: Kingston physical id: 0 serial: C32D5655 slot: ChannelA-DIMM0 size: 4GiB width: 64 bits clock: 1600MHz (0.6ns) *-bank:1 description: DIMM [empty] product: [Empty] vendor: [Empty] physical id: 1 serial: [Empty] slot: ChannelA-DIMM1 *-bank:2 description: SODIMM DDR3 Synchronous 1600 MHz (0,6 ns) product: M471B1G73QH0-YK0 vendor: Samsung physical id: 2 serial: 1519AD27 slot: ChannelB-DIMM0 size: 8GiB width: 64 bits clock: 1600MHz (0.6ns) *-bank:3 description: DIMM [empty] product: [Empty] vendor: [Empty] physical id: 3 serial: [Empty] slot: ChannelB-DIMM1 CPU: Intel(R) Core(TM) i7-4700HQ CPU @ 2.40GHz BIOS version: N56JRH.202 SSD Partitions (among others): Btrfs with OpenSUSE Tumbleweed installation, NTFS with Windows 10, Ext4 with Fedora installation. I have never noticed any corruptions on the NTFS and Ext4 file systems on the laptop, only on the Btrfs file systems. Best regards, Claes Fransson -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html