Kernel 4.4.0-rc7
System is Macbook with 109GB btrfs partition on SSD

System crashed (nothing in syslog, could have been btrfs or possibly GPU fault
shortly after running xrandr). After hard reset the btrfs partition was corrupt
and would not mount. I took an image of the partition (some of the output below
refers to sda5, some to loop0, it is the same image)

# dmesg of mount failure (using kernel 4.2 from an Ubuntu 15.10 recovery drive 
but same on 4.4.0-rc7): 

[ 1969.425321] BTRFS critical (device sda5): corrupt leaf, slot offset bad: 
block=67239936,root=1, slot=82
[ 1969.428809] BTRFS critical (device sda5): corrupt leaf, slot offset bad: 
block=67239936,root=1, slot=82
[ 1969.431981] ------------[ cut here ]------------
[ 1969.432018] WARNING: CPU: 2 PID: 11162 at 
/build/linux-cRemOf/linux-4.2.0/fs/btrfs/extent-tree.c:6264 
__btrfs_free_extent.isra.69+0x2ef/0xd70 [btrfs]()
[ 1969.432028] BTRFS: Transaction aborted (error -5)
[ 1969.432030] Modules linked in: drbg ansi_cprng ctr ccm intel_rapl iosf_mbi 
x86_pkg_temp_thermal intel_powerclamp btrfs arc4 xor coretemp rt2800usb 
rt2x00usb raid6_pq b43 rt2800lib rt2x00lib kvm_intel mac80211 cfg80211 btusb 
btrtl btbcm btintel bluetooth uvcvideo kvm ssb crc_ccitt crct10dif_pclmul 
crc32_pclmul snd_hda_codec_hdmi snd_hda_codec_cirrus snd_hda_codec_generic 
videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common videodev 
snd_hda_intel snd_hda_codec snd_hda_core aesni_intel aes_x86_64 lrw gf128mul 
glue_helper applesmc snd_hwdep snd_pcm snd_timer snd soundcore joydev 
input_polldev media ablk_helper cryptd bcm5974 input_leds lpc_ich bcma mei_me 
mei thunderbolt sbs sbshc acpi_als kfifo_buf apple_gmux industrialio mac_hid 
shpchp apple_bl autofs4 hid_generic i915 hid_apple sdhci_pci
[ 1969.432115]  i2c_algo_bit ahci uas drm_kms_helper libahci sdhci usb_storage 
usbhid drm video hid
[ 1969.432132] CPU: 2 PID: 11162 Comm: mount Tainted: G        W       
4.2.0-22-generic #27-Ubuntu
[ 1969.432136] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, 
BIOS MBP102.88Z.0106.B07.1501071215 01/07/2015
[ 1969.432139]  0000000000000000 00000000037e909d ffff88026267b5d8 
ffffffff817e94c9
[ 1969.432145]  0000000000000000 ffff88026267b630 ffff88026267b618 
ffffffff8107b3d6
[ 1969.432152]  ffff88026267b608 0000000e5297c000 00000000fffffffb 
0000000000000000
[ 1969.432157] Call Trace:
[ 1969.432167]  [<ffffffff817e94c9>] dump_stack+0x45/0x57
[ 1969.432175]  [<ffffffff8107b3d6>] warn_slowpath_common+0x86/0xc0
[ 1969.432181]  [<ffffffff8107b465>] warn_slowpath_fmt+0x55/0x70
[ 1969.432199]  [<ffffffffc086837f>] __btrfs_free_extent.isra.69+0x2ef/0xd70 
[btrfs]
[ 1969.432229]  [<ffffffffc08d1e5a>] ? find_ref_head+0x5a/0x80 [btrfs]
[ 1969.432248]  [<ffffffffc086c718>] __btrfs_run_delayed_refs+0x988/0x1080 
[btrfs]
[ 1969.432268]  [<ffffffffc086fb1e>] btrfs_run_delayed_refs.part.73+0x6e/0x270 
[btrfs]
[ 1969.432284]  [<ffffffffc0859ec3>] ? btrfs_set_path_blocking+0x43/0x80 [btrfs]
[ 1969.432306]  [<ffffffffc086fd35>] btrfs_run_delayed_refs+0x15/0x20 [btrfs]
[ 1969.432326]  [<ffffffffc0884756>] btrfs_commit_transaction+0x56/0xb20 [btrfs]
[ 1969.432332]  [<ffffffff811dde9f>] ? kmem_cache_free+0x1cf/0x1e0
[ 1969.432356]  [<ffffffffc08c9e0d>] btrfs_recover_log_trees+0x3ed/0x490 [btrfs]
[ 1969.432378]  [<ffffffffc08c5890>] ? replay_one_extent+0x6a0/0x6a0 [btrfs]
[ 1969.432397]  [<ffffffffc0881b21>] open_ctree+0x19b1/0x23e0 [btrfs]
[ 1969.432412]  [<ffffffffc0856fae>] btrfs_mount+0x94e/0xa70 [btrfs]
[ 1969.432420]  [<ffffffff813da845>] ? find_next_bit+0x15/0x20
[ 1969.432427]  [<ffffffff811a5e15>] ? pcpu_alloc+0x385/0x670
[ 1969.432434]  [<ffffffff812011c8>] mount_fs+0x38/0x160
[ 1969.432439]  [<ffffffff811a6135>] ? __alloc_percpu+0x15/0x20
[ 1969.432446]  [<ffffffff8121cfab>] vfs_kern_mount+0x6b/0x120
[ 1969.432463]  [<ffffffffc0856848>] btrfs_mount+0x1e8/0xa70 [btrfs]
[ 1969.432469]  [<ffffffff811a5e15>] ? pcpu_alloc+0x385/0x670
[ 1969.432475]  [<ffffffff812011c8>] mount_fs+0x38/0x160
[ 1969.432481]  [<ffffffff811a6135>] ? __alloc_percpu+0x15/0x20
[ 1969.432486]  [<ffffffff8121cfab>] vfs_kern_mount+0x6b/0x120
[ 1969.432493]  [<ffffffff8121f7b6>] do_mount+0x246/0xd10
[ 1969.432498]  [<ffffffff811a08fe>] ? strndup_user+0x4e/0xb0
[ 1969.432503]  [<ffffffff811a0876>] ? memdup_user+0x46/0x80
[ 1969.432510]  [<ffffffff812205af>] SyS_mount+0x9f/0x100
[ 1969.432519]  [<ffffffff817f02b2>] entry_SYSCALL_64_fastpath+0x16/0x75
[ 1969.432523] ---[ end trace 7a560cc73341e0d1 ]---
[ 1969.432528] BTRFS: error (device sda5) in __btrfs_free_extent:6264: errno=-5 
IO failure
[ 1969.436576] BTRFS: error (device sda5) in btrfs_run_delayed_refs:2781: 
errno=-5 IO failure
[ 1969.442036] BTRFS: error (device sda5) in btrfs_replay_log:2375: errno=-5 IO 
failure (Failed to recover log tree)
[ 1969.446087] BTRFS error (device sda5): cleaner transaction attach returned 
-30
[ 1969.486477] BTRFS: open_ctree failed


# btrfsck:

checking filesystem on sda5
UUID: b1103526-98a3-4b40-a782-cf66721ed600
checking extents
incorrect offsets 11897 5713478
bad block 67239936
Errors found in extent allocation tree or chunk allocation
checking free space cache
There is no free space entry for 61515563008-62297997312
cache appears valid but isnt 61224255488
found 7839154285 bytes used err is -22
total csum bytes: 0
total tree bytes: 7454720
total fs tree bytes: 0
total extent tree bytes: 7356416
btree space waste bytes: 2454021
file data blocks allocated: 28508160
 referenced 28508160


# btrfs-image (compiled v4.3.1 from git:.../kdave/btrfs-progs) segfaults:

Program received signal SIGSEGV, Segmentation fault.
copy_from_extent_tree (path=0x64e420, metadump=0x7fffffffdf90) at 
btrfs-image.c:1265
1265                            if (btrfs_extent_flags(leaf, ei) &
(gdb)
(gdb) bt
#0  copy_from_extent_tree (path=0x64e420, metadump=0x7fffffffdf90) at 
btrfs-image.c:1265
#1  create_metadump (walk_trees=<optimised out>, sanitize=<optimised out>, 
compress_level=<optimised out>, num_threads=0, out=0x64e010, 
input=0x7fffffffe3fe "/dev/sda5") at btrfs-image.c:1364
#2  main (argc=2, argv=0x7fffffffe198) at btrfs-image.c:2804


# After btrfs-zero-log the partition mounts but with errors:

[47995.081106] BTRFS: checking UUID tree
[48030.691448] BTRFS critical (device loop0): corrupt leaf, slot offset bad: 
block=67239936,root=1, slot=82
[48030.794656] BTRFS critical (device loop0): corrupt leaf, slot offset bad: 
block=67239936,root=1, slot=82
[49304.640667] BTRFS critical (device loop0): corrupt leaf, slot offset bad: 
block=67239936,root=1, slot=82
[49304.641861] ------------[ cut here ]------------
[49304.641888] WARNING: CPU: 0 PID: 3829 at 
/build/linux-cRemOf/linux-4.2.0/fs/btrfs/extent-tree.c:6264 
__btrfs_free_extent.isra.69+0x2ef/0xd70 [btrfs]()
[49304.641890] BTRFS: Transaction aborted (error -5)
[49304.641892] Modules linked in: btrfs xor raid6_pq arc4 rtl8723be btcoexist 
intel_rapl rtl8723_common rtl_pci intel_soc_dts_iosf rtlwifi mac80211 
intel_powerclamp cfg80211 snd_intel_sst_acpi snd_soc_rt5640 snd_intel_sst_core 
joydev input_leds snd_hda_codec_hdmi snd_soc_rl6231 snd_hda_codec_conexant 
snd_soc_sst_mfld_platform snd_hda_codec_generic rtsx_usb_ms coretemp kvm 
punit_atom_debug snd_soc_core memstick serio_raw crct10dif_pclmul crc32_pclmul 
cryptd snd_compress dw_dmac snd_hda_intel lpc_ich dw_dmac_core btusb btrtl 
btbcm btintel bluetooth ac97_bus iosf_mbi uvcvideo videobuf2_vmalloc 
videobuf2_memops videobuf2_core v4l2_common videodev media snd_hda_codec 
mei_txe shpchp snd_hda_core mei snd_hwdep snd_pcm_dmaengine rfkill_gpio 
8250_fintek ideapad_laptop intel_smartconnect sparse_keymap snd_pcm
[49304.641945]  snd_timer snd soundcore snd_soc_sst_acpi 
i2c_designware_platform pwm_lpss_platform 8250_dw i2c_designware_core 
spi_pxa2xx_platform pwm_lpss mac_hid autofs4 rtsx_usb_sdmmc rtsx_usb uas 
usb_storage psmouse i915 i2c_algo_bit r8169 drm_kms_helper mii drm ahci libahci 
video i2c_hid hid sdhci_acpi sdhci
[49304.641971] CPU: 0 PID: 3829 Comm: btrfs-transacti Tainted: G        W       
4.2.0-22-generic #27-Ubuntu
[49304.641973] Hardware name: LENOVO 80G0/Lancer 5A6, BIOS A7CN43WW 10/21/2014
[49304.641976]  0000000000000000 00000000dfc9f16b ffff880122ddfa98 
ffffffff817e94c9
[49304.641980]  0000000000000000 ffff880122ddfaf0 ffff880122ddfad8 
ffffffff8107b3d6
[49304.641983]  ffff880122ddfac8 0000000e52860000 00000000fffffffb 
0000000000000000
[49304.641987] Call Trace:
[49304.641995]  [<ffffffff817e94c9>] dump_stack+0x45/0x57
[49304.642000]  [<ffffffff8107b3d6>] warn_slowpath_common+0x86/0xc0
[49304.642004]  [<ffffffff8107b465>] warn_slowpath_fmt+0x55/0x70
[49304.642017]  [<ffffffffc07f737f>] __btrfs_free_extent.isra.69+0x2ef/0xd70 
[btrfs]
[49304.642035]  [<ffffffffc0860e5a>] ? find_ref_head+0x5a/0x80 [btrfs]
[49304.642049]  [<ffffffffc07fb718>] __btrfs_run_delayed_refs+0x988/0x1080 
[btrfs]
[49304.642063]  [<ffffffffc07feb1e>] btrfs_run_delayed_refs.part.73+0x6e/0x270 
[btrfs]
[49304.642081]  [<ffffffffc08750a3>] ? btrfs_scrub_pause+0xe3/0x110 [btrfs]
[49304.642094]  [<ffffffffc07fed35>] btrfs_run_delayed_refs+0x15/0x20 [btrfs]
[49304.642109]  [<ffffffffc0813c79>] btrfs_commit_transaction+0x579/0xb20 
[btrfs]
[49304.642124]  [<ffffffffc080ee19>] transaction_kthread+0x229/0x240 [btrfs]
[49304.642139]  [<ffffffffc080ebf0>] ? btrfs_cleanup_transaction+0x550/0x550 
[btrfs]
[49304.642143]  [<ffffffff8109a868>] kthread+0xd8/0xf0
[49304.642147]  [<ffffffff8109a790>] ? kthread_create_on_node+0x1f0/0x1f0
[49304.642151]  [<ffffffff817f06df>] ret_from_fork+0x3f/0x70
[49304.642154]  [<ffffffff8109a790>] ? kthread_create_on_node+0x1f0/0x1f0
[49304.642156] ---[ end trace bba14eeb59d201a5 ]---
[49304.642160] BTRFS: error (device loop0) in __btrfs_free_extent:6264: 
errno=-5 IO failure
[49304.643386] BTRFS info (device loop0): forced readonly
[49304.643391] BTRFS: error (device loop0) in btrfs_run_delayed_refs:2781: 
errno=-5 IO failure
[49304.644628] BTRFS warning (device loop0): Skipping commit of aborted 
transaction.
[49304.644632] BTRFS: error (device loop0) in cleanup_transaction:1710: 
errno=-5 IO failure


# btrfs fi show /mnt/x:

Label: 'LINUX'  uuid: b1103526-98a3-4b40-a782-cf66721ed600
        Total devices 1 FS bytes used 84.16GiB
        devid    1 size 108.80GiB used 108.80GiB path /dev/loop0


# btrfs fi df /mnt/x:

Data, single: total=106.79GiB, used=83.00GiB
System, single: total=4.00MiB, used=16.00KiB
Metadata, single: total=2.01GiB, used=1.16GiB
GlobalReserve, single: total=400.00MiB, used=0.00B


# btrfsck --repair:

enabling repair mode
Checking filesystem on /dev/sda5
UUID: b1103526-98a3-4b40-a782-cf66721ed600
checking extents
incorrect offsets 11897 5713478
Deleting bogus item [61515546624,168,16384] at slot 83 on block 4243456
Shifting item nr 83 by 275 bytes in block 4243456
items overlap, can't fix
cmds-check.c:4118: fix_item_offset: Assertion `ret` failed.
btrfs check[0x426e9c]
btrfs check[0x427114]
btrfs check[0x427d2d]
btrfs check(cmd_check+0x15bb)[0x42b203]
btrfs check(main+0x155)[0x409eae]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7ff6aacb2a40]
btrfs check(_start+0x29)[0x409a69]


# btrfsck --repair --init-extent-tree:

[takes a while, over 3 million lines of output... small extract of end follows]

Backref 123269292032 root 257 owner 1089423 offset 17063936 num_refs 0 not 
found in extent tree
Incorrect local backref count on 123269292032 root 257 owner 1089423 offset 
17063936 found 1 wanted 0 back 0x25c5ae0
Backref 123269292032 parent 61250600960 owner 0 offset 0 num_refs 0 not found 
in extent tree
Incorrect local backref count on 123269292032 parent 61250600960 owner 0 offset 
0 found 1 wanted 0 back 0xdadeb40
backpointer mismatch on [123269292032 4096]
ref mismatch on [123269373952 12288] extent item 0, found 3
adding new data backref on 123269373952 root 257 owner 184332 offset 356352 
found 1
adding new data backref on 123269373952 parent 61254959104 owner 0 offset 0 
found 1
adding new data backref on 123269373952 parent 63946752 owner 0 offset 0 found 1
Backref 123269373952 root 257 owner 184332 offset 356352 num_refs 0 not found 
in extent tree
Incorrect local backref count on 123269373952 root 257 owner 184332 offset 
356352 found 1 wanted 0 back 0x258a4e0
Backref 123269373952 parent 61254959104 owner 0 offset 0 num_refs 0 not found 
in extent tree
Incorrect local backref count on 123269373952 parent 61254959104 owner 0 offset 
0 found 1 wanted 0 back 0xdb085a0
Backref 123269373952 parent 63946752 owner 0 offset 0 num_refs 0 not found in 
extent tree
Incorrect local backref count on 123269373952 parent 63946752 owner 0 offset 0 
found 1 wanted 0 back 0xde73e50
backpointer mismatch on [123269373952 12288]
ref mismatch on [123269513216 28672] extent item 0, found 1
adding new data backref on 123269513216 root 260 owner 407786 offset 559493120 
found 1
Backref 123269513216 root 260 owner 407786 offset 559493120 num_refs 0 not 
found in extent tree
Incorrect local backref count on 123269513216 root 260 owner 407786 offset 
559493120 found 1 wanted 0 back 0xca364a0
backpointer mismatch on [123269513216 28672]
repaired damaged extent references
Fixed 0 roots.
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
checking csums
checking root refs
found 1168523045 bytes used err is 0
total csum bytes: 84470212
total tree bytes: 1167949824
total fs tree bytes: 1050476544
total extent tree bytes: 16384
btree space waste bytes: 231805196
file data blocks allocated: 209697452032
 referenced 92010127360


# btrfs scrub then showed only a few files with bad checksums:

       - 5.8GB @jessie/var/log/syslog.1
       - 308MB @jessie/var/log/syslog.1.gz
       - 815MB @tmp/linux/.git/objects/pack/tmp_pack_M544ZX 

The timestamps show these files are several months old so were probably not
open at the time of the crash.

After removing these files no other errors are reported.

The recovered partition then successfully booted, but the next boot failed with
drive full errors. It looks like a btrfs metadata issue:

Data, single: total=106.79GiB, used=82.01GiB
System, single: total=4.00MiB, used=16.00KiB
Metadata, single: total=2.01GiB, used=1.51GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

Even though the original pre-recovery crashed paritition showed 1.16GiB/2.01GiB
metadata used, it appears used metadata can quickly grow. 

Deleting some subvolumes brought the metadata used down to 1.02GiB but:

# btrfs balance start -dlimit=5 results in "ERROR: error during balancing '/' - 
No space left on device"

Suggests that even with a used/total metadata of 1.02GiB/2.01GiB there are
still issues. Perhaps the used portion can quickly grow to the point where it
causes problems?

Repeating balance again, starting with dlimit=1 then dlimit=2 then dlimit=5,
was successful with 8 chunks reallocated and "btrfs fi show /" showing
103.02GiB/108.80GiB allocated.

I have an image of the crashed partition so let me know if any further analysis
would be useful. I would guess that the low free metadata may have been a
factor in the crash; it was enough of an issue to cause the post-recovery
system to be unable to boot properly.

FWIW I found several other reports of unbootable btrfs with the same "corrupt
leaf, slot offset bad" error, some also reporting that it occurred after a
crash that required a hard reset:

https://bugzilla.kernel.org/show_bug.cgi?id=74611
http://unix.stackexchange.com/questions/195888/how-to-repair-a-btrfs-block
http://www.spinics.net/lists/linux-btrfs/msg48167.html
http://ubuntuforums.org/showthread.php?t=2301288
https://lists.archlinux.org/pipermail/arch-general/2014-April/035885.html
http://comments.gmane.org/gmane.comp.file-systems.btrfs/34190
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to