Hi!

I really like the features of BTRFS, especially deduplication,
snapshotting and checksumming. However, when using it on my laptop the
last couple of years, it has became corrupted a lot of times.
Sometimes I have managed to fix the problems (at least so much that I
can continue to use the filesystem) with check --repair, but several
times I had to recreate the file system and reinstall the operating
system.

I am guessing the corruptions might be the results of unclean
shutdowns, mostly after system hangs, but also because of running out
of battery sometimes?
Furthermore, the power-led has recently started blinking (also when
the power-cable is plugged in), I guess because of an old and bad
battery. Maybe the current corruption also can have something to do
with this? However I almost always run with power cable plugged in in
last year, only on battery a few seconds a few times when moving the
laptop.

Currently, I can only mount the filesystem readonly, it goes readonly
automatically if I try to mount it normally.

When booting an OpenSUSE Tumbleweed-20180119 live-iso:
localhost:~ # uname -r
4.14.13-1-default
localhost:~ # btrfs --version
btrfs-progs v4.14.1

localhost:~ # btrfs check -p /dev/sda12
Checking filesystem on /dev/sda12
                                                    UUID:
d2819d5a-fd69-484b-bf34-f2b5692cbe1f
                                        bad key ordering 159 160

                           bad block 690436964352



            ERROR: errors found in extent allocation tree or chunk
allocation                                               checking free
space cache [.]
                                           checking fs roots [o]

                                 checking csums

                      bad key ordering 159 160

         Error looking up extent record -1

Right section didn't have a record
                                                        There are no
extents for csum range 22732550144-24923615232
                        Csum exists for 16303538176-24923615232 but
there is no extent record                                     ERROR:
errors found in csum tree
                                             found 344063430663 bytes
used, error(s) found
                      total csum bytes: 0

            total tree bytes: 453410816

total fs tree bytes: 0
                                                                total
extent tree bytes: 452952064
btree space waste bytes: 140165932
                                                    file data blocks
allocated: 108462080
 referenced 108462080

localhost:~ # btrfs inspect-internal dump-tree -b 690436964352
/dev/sda12
btrfs-progs v4.14.1
                                                               leaf
690436964352 items 170 free space 1811 generation 196864 owner 2
                            leaf 690436964352 flags 0x1(WRITTEN)
backref revision 1
    fs uuid d2819d5a-fd69-484b-bf34-f2b5692cbe1f
                                                  chunk uuid
52f81fe6-893b-4432-9336-895057ee81e1
.
.
.
        item 157 key (22732500992 EXTENT_ITEM 16384) itemoff 6538 itemsize 53
                refs 1 gen 821 flags DATA
                extent data backref root 287 objectid 51665 offset 0 count 1
        item 158 key (22732517376 EXTENT_ITEM 16384) itemoff 6485 itemsize 53
                refs 1 gen 821 flags DATA
                extent data backref root 287 objectid 51666 offset 0 count 1
        item 159 key (22732533760 EXTENT_ITEM 16384) itemoff 6485 itemsize 0
print-tree.c:428: print_extent_item: BUG_ON `item_size !=
sizeof(*ei0)` triggered, value 1
btrfs(+0x365c6)[0x55bdfaada5c6]
btrfs(print_extent_item+0x424)[0x55bdfaadb284]
btrfs(btrfs_print_leaf+0x94e)[0x55bdfaadbc1e]
btrfs(btrfs_print_tree+0x295)[0x55bdfaadcf05]
btrfs(cmd_inspect_dump_tree+0x734)[0x55bdfab1b024]
btrfs(main+0x7d)[0x55bdfaac7d4d]
/lib64/libc.so.6(__libc_start_main+0xea)[0x7ff42100ff4a]
btrfs(_start+0x2a)[0x55bdfaac7e5a]
Aborted (core dumped)


check --repair hangs after reporting "bad key ordering 159 160" with
no disk activity but constant high cpu usage.

localhost:~ # smartctl -a /dev/sda
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.14.13-1-default] (SUSE RPM)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     SanDisk SD8SB8U1T001122
Serial Number:    163076421231
LU WWN Device Id: 5 001b44 4a4dde388
Firmware Version: X4140000
User Capacity:    1,024,209,543,168 bytes [1.02 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Jan 22 15:28:46 2018 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x11) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  10) minutes.

SMART Attributes Data Structure revision number: 4
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0032   100   100   ---    Old_age
Always       -       0
  9 Power_On_Hours          0x0032   100   100   ---    Old_age
Always       -       7692
 12 Power_Cycle_Count       0x0032   100   100   ---    Old_age
Always       -       496
165 Unknown_Attribute       0x0032   100   100   ---    Old_age
Always       -       1112516724361
166 Unknown_Attribute       0x0032   100   100   ---    Old_age
Always       -       1
167 Unknown_Attribute       0x0032   100   100   ---    Old_age
Always       -       25
168 Unknown_Attribute       0x0032   100   100   ---    Old_age
Always       -       44
169 Unknown_Attribute       0x0032   100   100   ---    Old_age
Always       -       753
170 Unknown_Attribute       0x0032   100   100   ---    Old_age
Always       -       0
171 Unknown_Attribute       0x0032   100   100   ---    Old_age
Always       -       0
172 Unknown_Attribute       0x0032   100   100   ---    Old_age
Always       -       0
173 Unknown_Attribute       0x0032   100   100   ---    Old_age
Always       -       18
174 Unknown_Attribute       0x0032   100   100   ---    Old_age
Always       -       57
184 End-to-End_Error        0x0032   100   100   ---    Old_age
Always       -       0
187 Reported_Uncorrect      0x0032   100   100   ---    Old_age
Always       -       0
188 Command_Timeout         0x0032   100   100   ---    Old_age
Always       -       1
194 Temperature_Celsius     0x0022   061   062   ---    Old_age
Always       -       39 (Min/Max 9/62)
199 UDMA_CRC_Error_Count    0x0032   100   100   ---    Old_age
Always       -       0
230 Unknown_SSD_Attribute   0x0032   100   100   ---    Old_age
Always       -       4733091251278
232 Available_Reservd_Space 0x0033   100   100   004    Pre-fail
Always       -       100
233 Media_Wearout_Indicator 0x0032   100   100   ---    Old_age
Always       -       19202
234 Unknown_Attribute       0x0032   100   100   ---    Old_age
Always       -       32167
241 Total_LBAs_Written      0x0030   253   253   ---    Old_age
Offline      -       22520
242 Total_LBAs_Read         0x0030   253   253   ---    Old_age
Offline      -       183882
244 Unknown_Attribute       0x0032   000   100   ---    Old_age
Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining
LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      7570         -
# 2  Extended offline    Completed without error       00%      7395         -
# 3  Extended offline    Completed without error       00%      6253         -
# 4  Short offline       Completed without error       00%      4030         -
# 5  Extended offline    Completed without error       00%      1568         -
# 6  Extended offline    Completed without error       00%      1434         -

Selective Self-tests/Logging not supported

localhost:~ # btrfs fi usage /mnt
Overall:
    Device size:                 450.00GiB
    Device allocated:            424.04GiB
    Device unallocated:           25.96GiB
    Device missing:                  0.00B
    Used:                        420.38GiB
    Free (estimated):             27.39GiB      (min: 27.39GiB)
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:              512.00MiB      (used: 0.00B)

Data,single: Size:411.98GiB, Used:410.55GiB
   /dev/sda12    411.98GiB

Metadata,single: Size:12.00GiB, Used:9.83GiB
   /dev/sda12     12.00GiB

System,single: Size:64.00MiB, Used:64.00KiB
   /dev/sda12     64.00MiB

Unallocated:
   /dev/sda12     25.96GiB

The filesystem had become pretty full, I had planned to increase the
Btrfs-partition size before it became corrupt.

Active kernel when the filesystem went read only: OpenSUSE Linux
4.14.14-1.geef6178-default, from the
http://download.opensuse.org/repositories/Kernel:/stable/standard/stable
repository.

Fstab mount options: noatime,autodefrag (I have been using the option
nossd with older kernels one period in the past on the filesystem).

If it matters, I have been running duperemove many times on the
filesystem since creation.

To test the RAM, I have been running mprime Blend-test for 24 hours
after the corruption without any error or warning.

Is there a way I can try to repair this filesystem without the need to
recreate it and reinstall the operating system? A reinstall including
all currently installed packages, and restoring all current system
settings, would probably take some time for me to do.
If it is currently not repairable, it would be nice if this kind of
corruption could be repaired in the future, even if losing a few
files. Or if the corruptions could be avoided in the first place.

Laptop: Asus N56JR-S4075H, bought new 2014
Hard drive: since 14 months a SanDisk X400 SD8SB8U1T001122 1TB SSD,
originally a Seagate ST750LM000 SSHD
RAM: lshw:-memory
          description: System Memory
          physical id: c
          slot: System board or motherboard
          size: 12GiB
        *-bank:0
             description: SODIMM DDR3 Synchronous 1600 MHz (0,6 ns)
             product: ASU16D3LS1KBG/4G
             vendor: Kingston
             physical id: 0
             serial: C32D5655
             slot: ChannelA-DIMM0
             size: 4GiB
             width: 64 bits
             clock: 1600MHz (0.6ns)
        *-bank:1
             description: DIMM [empty]
             product: [Empty]
             vendor: [Empty]
             physical id: 1
             serial: [Empty]
             slot: ChannelA-DIMM1
        *-bank:2
             description: SODIMM DDR3 Synchronous 1600 MHz (0,6 ns)
             product: M471B1G73QH0-YK0
             vendor: Samsung
             physical id: 2
             serial: 1519AD27
             slot: ChannelB-DIMM0
             size: 8GiB
             width: 64 bits
             clock: 1600MHz (0.6ns)
        *-bank:3
             description: DIMM [empty]
             product: [Empty]
             vendor: [Empty]
             physical id: 3
             serial: [Empty]
             slot: ChannelB-DIMM1
CPU: Intel(R) Core(TM) i7-4700HQ CPU @ 2.40GHz
BIOS version: N56JRH.202
SSD Partitions (among others): Btrfs with OpenSUSE Tumbleweed
installation, NTFS with Windows 10, Ext4 with Fedora installation.

I have never noticed any corruptions on the NTFS and Ext4 file systems
on the laptop, only on the Btrfs file systems.

Best regards,
Claes Fransson
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to