On Fri, Sep 16, 2016 at 03:25:00PM -0400, Sean Greenslade wrote:
> Hi, all. I've been playing around with an old laptop of mine, and I
> figured I'd use it as a learning / bugfinding opportunity. Its /home
> partition was originally ext3. I have a full partition image of this
> drive as a backup, so I can do (and have done) potentially destructive
> things. The system disk is a ~6 year old SSD.
> 
> To start, I rebooted to a livedisk (Arch, kernel 4.7.2 w/progs 4.7.1)
> and ran a simple btrfs-convert on it. After patching up the fstab and
> rebooting, everything seemed fine. I deleted the recovery subvol, ran a
> full balance, ran a full defrag, and rebooted again. I then decided to
> try (as an experiment) using DUP mode for data and metadata. I ran that
> balance without issue, then started using the machine. Sometime later, I
> got the following remount ro:
> 
> [ 7316.764235] ------------[ cut here ]------------
> [ 7316.764292] WARNING: CPU: 2 PID: 14196 at fs/btrfs/inode.c:2954 
> btrfs_finish_ordered_io+0x6bc/0x6d0 [btrfs]
> [ 7316.764297] BTRFS: Transaction aborted (error -95)
> [ 7316.764301] Modules linked in: fuse sha256_ssse3 sha256_generic hmac drbg 
> ansi_cprng ctr ccm joydev mousedev uvcvideo videobuf2_vmalloc 
> videobuf2_memops videobuf2_v4l2 videobuf2_core videodev media crc32c_generic 
> iTCO_wdt btrfs iTCO_vendor_support arc4 xor ath9k raid6_pq ath9k_common 
> ath9k_hw ath mac80211 snd_hda_codec_realtek snd_hda_codec_generic psmouse 
> input_leds coretemp snd_hda_intel led_class pcspkr snd_hda_codec cfg80211 
> snd_hwdep snd_hda_core snd_pcm lpc_ich snd_timer atl1c rfkill snd soundcore 
> shpchp intel_agp wmi thermal fjes battery evdev ac tpm_tis mac_hid tpm 
> sch_fq_codel vboxnetflt(O) vboxnetadp(O) pci_stub vboxpci(O) vboxdrv(O) loop 
> sg acpi_cpufreq ip_tables x_tables ext4 crc16 jbd2 mbcache sd_mod serio_raw 
> atkbd libps2 ahci libahci uhci_hcd libata scsi_mod ehci_pci ehci_hcd usbcore
> [ 7316.764434]  usb_common i8042 serio i915 video button intel_gtt 
> i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm
> [ 7316.764462] CPU: 2 PID: 14196 Comm: kworker/u8:11 Tainted: G           O   
>  4.7.3-5-ck #1
> [ 7316.764467] Hardware name: ASUSTeK Computer INC. 1015PEM/1015PE, BIOS 0903 
>    11/08/2010
> [ 7316.764507] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
> [ 7316.764513]  0000000000000286 000000006101f47d ffff8800230dbc78 
> ffffffff812f0215
> [ 7316.764522]  ffff8800230dbcc8 0000000000000000 ffff8800230dbcb8 
> ffffffff8107ae6f
> [ 7316.764530]  00000b8a00000035 ffff88007791afa8 ffff8800751d9000 
> ffff880014101d40
> [ 7316.764538] Call Trace:
> [ 7316.764551]  [<ffffffff812f0215>] dump_stack+0x63/0x8e
> [ 7316.764560]  [<ffffffff8107ae6f>] __warn+0xcf/0xf0
> [ 7316.764567]  [<ffffffff8107aef1>] warn_slowpath_fmt+0x61/0x80
> [ 7316.764605]  [<ffffffffa07aa362>] ? unpin_extent_cache+0xa2/0xf0 [btrfs]
> [ 7316.764640]  [<ffffffffa07628e6>] ? btrfs_free_path+0x26/0x30 [btrfs]
> [ 7316.764677]  [<ffffffffa079aaac>] btrfs_finish_ordered_io+0x6bc/0x6d0 
> [btrfs]
> [ 7316.764715]  [<ffffffffa079adc5>] finish_ordered_fn+0x15/0x20 [btrfs]
> [ 7316.764753]  [<ffffffffa07c5f8e>] btrfs_scrubparity_helper+0x7e/0x360 
> [btrfs]
> [ 7316.764791]  [<ffffffffa07c62fe>] btrfs_endio_write_helper+0xe/0x10 [btrfs]
> [ 7316.764799]  [<ffffffff810949bd>] process_one_work+0x1ed/0x490
> [ 7316.764806]  [<ffffffff81094ca9>] worker_thread+0x49/0x500
> [ 7316.764813]  [<ffffffff81094c60>] ? process_one_work+0x490/0x490
> [ 7316.764820]  [<ffffffff8109ac3a>] kthread+0xda/0xf0
> [ 7316.764830]  [<ffffffff815c553f>] ret_from_fork+0x1f/0x40
> [ 7316.764838]  [<ffffffff8109ab60>] ? kthread_worker_fn+0x170/0x170
> [ 7316.764843] ---[ end trace 90f54effc5e294b0 ]---
> [ 7316.764851] BTRFS: error (device sda2) in btrfs_finish_ordered_io:2954: 
> errno=-95 unknown
> [ 7316.764859] BTRFS info (device sda2): forced readonly
> [ 7316.765396] pending csums is 9437184
> 
> After seeing this, I decided to attempt a repair (confident that I could
> restore from backup if it failed). At the time, I was unaware of the
> issues with progs 4.7.1, so when I ran the check and saw all the
> incorrect backrefs messages, I figured that was my problem and ran the
> --repair. Of course, this didn't make the messages go away on subsequent
> checks, so I looked further and found this bug:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=155791
> 
> I updated progs to 4.7.2 and re-ran the --repair (I didn't save any of
> the logs from these, unfortunately). The repair seemed to work (I also
> used --init-extent-tree), as current checks don't report any errors.
> 
> The system boots and mounts the FS just fine. I can read from it all
> day, scrubs complete without failure, but just using the system for a
> while will eventually trigger the same "Transaction aborted (error -95)"
> error.
> 
> I realize this is something of a mess, and that I was less than
> methodical with my actions so far. Given that I have a full backup that
> can be restored if need be (and I certainly could try running the
> convert again), what is my best course of action?

Interesting, seems that we get errors from 

btrfs_finish_ordered_io
  insert_reserved_file_extent
    __btrfs_drop_extents

And splitting an inline extent throws -95.

Thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to