On 2018/11/4 下午9:15, Sebastian Ochmann wrote:
> Hello,
> 
> I have a btrfs filesystem on a single encrypted (LUKS) 10 TB drive which
> stopped working correctly. The drive is used as a backup drive with zstd
> compression to which I regularly rsync and make daily snapshots. After I
> routinely removed a bunch of snapshots (about 20), I noticed later that
> the machine would hang when trying to unmount the filesystem. The
> current state is that I'm able to mount the filesystem without errors
> and I can view (ls) files in the root level, but trying to view contents
> of directories contained therein hangs just like when trying to unmount
> the filesystem. I have not yet tried to run check, repair, etc. Do you
> have any advice what I should try next?

Could you please run "btrfs check" on the umounted fs?

> 
> A notable hardware change I did a few days before the problem is a
> switch from an Intel Xeon platform to AMD Threadripper. However, I
> haven't seen problems with the rest of the btrfs filesystems (in
> particular, a RAID-1 consisting of three HDDs), which I also migrated to
> the new platform, yet. I just want to mention it in case there are known
> issues in that direction.
> 
> Kernel 4.18.16 (Arch Linux)
> btrfs-progs 4.17.1
> 
> Kernel log after trying to "ls" a directory contained in the
> filesystem's root directory:
> 
> [   79.279349] BTRFS info (device dm-5): use zstd compression, level 0
> [   79.279351] BTRFS info (device dm-5): disk space caching is enabled
> [   79.279352] BTRFS info (device dm-5): has skinny extents
> [  135.202344] kauditd_printk_skb: 2 callbacks suppressed
> [  135.202347] audit: type=1130 audit(1541335770.667:45): pid=1 uid=0
> auid=4294967295 ses=4294967295 msg='unit=polkit comm="systemd"
> exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
> [  135.364850] audit: type=1130 audit(1541335770.831:46): pid=1 uid=0
> auid=4294967295 ses=4294967295 msg='unit=udisks2 comm="systemd"
> exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
> [  135.589255] audit: type=1130 audit(1541335771.054:47): pid=1 uid=0
> auid=4294967295 ses=4294967295 msg='unit=rtkit-daemon comm="systemd"
> exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
> [  368.266653] INFO: task kworker/u256:1:728 blocked for more than 120
> seconds.
> [  368.266657]       Tainted: P           OE     4.18.16-arch1-1-ARCH #1
> [  368.266658] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  368.266660] kworker/u256:1  D    0   728      2 0x80000080
> [  368.266680] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper
> [btrfs]
> [  368.266681] Call Trace:
> [  368.266687]  ? __schedule+0x29b/0x8b0
> [  368.266690]  ? preempt_count_add+0x68/0xa0
> [  368.266692]  schedule+0x32/0x90
> [  368.266707]  btrfs_tree_read_lock+0x7d/0x110 [btrfs]
> [  368.266710]  ? wait_woken+0x80/0x80
> [  368.266719]  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
> [  368.266729]  btrfs_search_slot+0xf6/0xa00 [btrfs]
> [  368.266732]  ? _raw_spin_unlock+0x16/0x30
> [  368.266734]  ? inode_insert5+0x105/0x1a0
> [  368.266746]  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> [  368.266749]  ? kmem_cache_alloc+0x179/0x1d0
> [  368.266762]  btrfs_iget+0x113/0x690 [btrfs]
> [  368.266764]  ? _raw_spin_unlock+0x16/0x30
> [  368.266778]  __lookup_free_space_inode+0xd8/0x150 [btrfs]
> [  368.266792]  lookup_free_space_inode+0x63/0xc0 [btrfs]
> [  368.266806]  load_free_space_cache+0x6e/0x190 [btrfs]
> [  368.266808]  ? kmem_cache_alloc_trace+0x181/0x1d0
> [  368.266817]  ? cache_block_group+0x73/0x3e0 [btrfs]
> [  368.266827]  cache_block_group+0x1c1/0x3e0 [btrfs]

This thread is trying to get tree root lock to create free space cache,
while some one already has locked the tree root.

> [  368.266829]  ? wait_woken+0x80/0x80
> [  368.266839]  find_free_extent+0x872/0x10e0 [btrfs]
> [  368.266851]  btrfs_reserve_extent+0x9b/0x180 [btrfs]
> [  368.266862]  btrfs_alloc_tree_block+0x1b3/0x4d0 [btrfs]
> [  368.266872]  __btrfs_cow_block+0x11d/0x500 [btrfs]
> [  368.266882]  btrfs_cow_block+0xdc/0x1a0 [btrfs]
> [  368.266891]  btrfs_search_slot+0x282/0xa00 [btrfs]
> [  368.266893]  ? _raw_spin_unlock+0x16/0x30
> [  368.266903]  btrfs_insert_empty_items+0x67/0xc0 [btrfs]
> [  368.266913]  __btrfs_run_delayed_refs+0x8ef/0x10a0 [btrfs]
> [  368.266915]  ? preempt_count_add+0x68/0xa0
> [  368.266926]  btrfs_run_delayed_refs+0x72/0x180 [btrfs]
> [  368.266937]  delayed_ref_async_start+0x81/0x90 [btrfs]
> [  368.266950]  normal_work_helper+0xbd/0x350 [btrfs]
> [  368.266953]  process_one_work+0x1eb/0x3c0
> [  368.266955]  worker_thread+0x2d/0x3d0
> [  368.266956]  ? process_one_work+0x3c0/0x3c0
> [  368.266958]  kthread+0x112/0x130
> [  368.266960]  ? kthread_flush_work_fn+0x10/0x10
> [  368.266961]  ret_from_fork+0x22/0x40
> [  368.266978] INFO: task btrfs-cleaner:1196 blocked for more than 120
> seconds.
[snip, this trace doesn't look interesting at all]
> [  368.267135] INFO: task btrfs-transacti:1197 blocked for more than 120
> seconds.
> [  368.267136]       Tainted: P           OE     4.18.16-arch1-1-ARCH #1
> [  368.267137] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  368.267138] btrfs-transacti D    0  1197      2 0x80000080
> [  368.267139] Call Trace:
> [  368.267141]  ? __schedule+0x29b/0x8b0
> [  368.267142]  ? preempt_count_add+0x68/0xa0
> [  368.267144]  schedule+0x32/0x90
> [  368.267156]  btrfs_tree_lock+0x113/0x1f0 [btrfs]
> [  368.267158]  ? wait_woken+0x80/0x80
> [  368.267167]  btrfs_search_slot+0x7b4/0xa00 [btrfs]
> [  368.267178]  btrfs_insert_empty_items+0x67/0xc0 [btrfs]
> [  368.267188]  __btrfs_run_delayed_refs+0x8ef/0x10a0 [btrfs]
> [  368.267190]  ? _raw_spin_unlock_irq+0x1d/0x30
> [  368.267201]  btrfs_run_delayed_refs+0x72/0x180 [btrfs]
> [  368.267212]  btrfs_commit_transaction+0x40/0x8a0 [btrfs]

And thread is running commit transaction.

This looks pretty like this bug which should be fixed by the following
patch:

https://patchwork.kernel.org/patch/10654433/

If previous "btrfs check" shows no error, would you please try apply
that patch and try again?

Thanks,
Qu

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to