On 2018/11/4 下午9:15, Sebastian Ochmann wrote: > Hello, > > I have a btrfs filesystem on a single encrypted (LUKS) 10 TB drive which > stopped working correctly. The drive is used as a backup drive with zstd > compression to which I regularly rsync and make daily snapshots. After I > routinely removed a bunch of snapshots (about 20), I noticed later that > the machine would hang when trying to unmount the filesystem. The > current state is that I'm able to mount the filesystem without errors > and I can view (ls) files in the root level, but trying to view contents > of directories contained therein hangs just like when trying to unmount > the filesystem. I have not yet tried to run check, repair, etc. Do you > have any advice what I should try next?
Could you please run "btrfs check" on the umounted fs? > > A notable hardware change I did a few days before the problem is a > switch from an Intel Xeon platform to AMD Threadripper. However, I > haven't seen problems with the rest of the btrfs filesystems (in > particular, a RAID-1 consisting of three HDDs), which I also migrated to > the new platform, yet. I just want to mention it in case there are known > issues in that direction. > > Kernel 4.18.16 (Arch Linux) > btrfs-progs 4.17.1 > > Kernel log after trying to "ls" a directory contained in the > filesystem's root directory: > > [ 79.279349] BTRFS info (device dm-5): use zstd compression, level 0 > [ 79.279351] BTRFS info (device dm-5): disk space caching is enabled > [ 79.279352] BTRFS info (device dm-5): has skinny extents > [ 135.202344] kauditd_printk_skb: 2 callbacks suppressed > [ 135.202347] audit: type=1130 audit(1541335770.667:45): pid=1 uid=0 > auid=4294967295 ses=4294967295 msg='unit=polkit comm="systemd" > exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' > [ 135.364850] audit: type=1130 audit(1541335770.831:46): pid=1 uid=0 > auid=4294967295 ses=4294967295 msg='unit=udisks2 comm="systemd" > exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' > [ 135.589255] audit: type=1130 audit(1541335771.054:47): pid=1 uid=0 > auid=4294967295 ses=4294967295 msg='unit=rtkit-daemon comm="systemd" > exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' > [ 368.266653] INFO: task kworker/u256:1:728 blocked for more than 120 > seconds. > [ 368.266657] Tainted: P OE 4.18.16-arch1-1-ARCH #1 > [ 368.266658] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 368.266660] kworker/u256:1 D 0 728 2 0x80000080 > [ 368.266680] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper > [btrfs] > [ 368.266681] Call Trace: > [ 368.266687] ? __schedule+0x29b/0x8b0 > [ 368.266690] ? preempt_count_add+0x68/0xa0 > [ 368.266692] schedule+0x32/0x90 > [ 368.266707] btrfs_tree_read_lock+0x7d/0x110 [btrfs] > [ 368.266710] ? wait_woken+0x80/0x80 > [ 368.266719] btrfs_read_lock_root_node+0x2f/0x40 [btrfs] > [ 368.266729] btrfs_search_slot+0xf6/0xa00 [btrfs] > [ 368.266732] ? _raw_spin_unlock+0x16/0x30 > [ 368.266734] ? inode_insert5+0x105/0x1a0 > [ 368.266746] btrfs_lookup_inode+0x3a/0xc0 [btrfs] > [ 368.266749] ? kmem_cache_alloc+0x179/0x1d0 > [ 368.266762] btrfs_iget+0x113/0x690 [btrfs] > [ 368.266764] ? _raw_spin_unlock+0x16/0x30 > [ 368.266778] __lookup_free_space_inode+0xd8/0x150 [btrfs] > [ 368.266792] lookup_free_space_inode+0x63/0xc0 [btrfs] > [ 368.266806] load_free_space_cache+0x6e/0x190 [btrfs] > [ 368.266808] ? kmem_cache_alloc_trace+0x181/0x1d0 > [ 368.266817] ? cache_block_group+0x73/0x3e0 [btrfs] > [ 368.266827] cache_block_group+0x1c1/0x3e0 [btrfs] This thread is trying to get tree root lock to create free space cache, while some one already has locked the tree root. > [ 368.266829] ? wait_woken+0x80/0x80 > [ 368.266839] find_free_extent+0x872/0x10e0 [btrfs] > [ 368.266851] btrfs_reserve_extent+0x9b/0x180 [btrfs] > [ 368.266862] btrfs_alloc_tree_block+0x1b3/0x4d0 [btrfs] > [ 368.266872] __btrfs_cow_block+0x11d/0x500 [btrfs] > [ 368.266882] btrfs_cow_block+0xdc/0x1a0 [btrfs] > [ 368.266891] btrfs_search_slot+0x282/0xa00 [btrfs] > [ 368.266893] ? _raw_spin_unlock+0x16/0x30 > [ 368.266903] btrfs_insert_empty_items+0x67/0xc0 [btrfs] > [ 368.266913] __btrfs_run_delayed_refs+0x8ef/0x10a0 [btrfs] > [ 368.266915] ? preempt_count_add+0x68/0xa0 > [ 368.266926] btrfs_run_delayed_refs+0x72/0x180 [btrfs] > [ 368.266937] delayed_ref_async_start+0x81/0x90 [btrfs] > [ 368.266950] normal_work_helper+0xbd/0x350 [btrfs] > [ 368.266953] process_one_work+0x1eb/0x3c0 > [ 368.266955] worker_thread+0x2d/0x3d0 > [ 368.266956] ? process_one_work+0x3c0/0x3c0 > [ 368.266958] kthread+0x112/0x130 > [ 368.266960] ? kthread_flush_work_fn+0x10/0x10 > [ 368.266961] ret_from_fork+0x22/0x40 > [ 368.266978] INFO: task btrfs-cleaner:1196 blocked for more than 120 > seconds. [snip, this trace doesn't look interesting at all] > [ 368.267135] INFO: task btrfs-transacti:1197 blocked for more than 120 > seconds. > [ 368.267136] Tainted: P OE 4.18.16-arch1-1-ARCH #1 > [ 368.267137] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 368.267138] btrfs-transacti D 0 1197 2 0x80000080 > [ 368.267139] Call Trace: > [ 368.267141] ? __schedule+0x29b/0x8b0 > [ 368.267142] ? preempt_count_add+0x68/0xa0 > [ 368.267144] schedule+0x32/0x90 > [ 368.267156] btrfs_tree_lock+0x113/0x1f0 [btrfs] > [ 368.267158] ? wait_woken+0x80/0x80 > [ 368.267167] btrfs_search_slot+0x7b4/0xa00 [btrfs] > [ 368.267178] btrfs_insert_empty_items+0x67/0xc0 [btrfs] > [ 368.267188] __btrfs_run_delayed_refs+0x8ef/0x10a0 [btrfs] > [ 368.267190] ? _raw_spin_unlock_irq+0x1d/0x30 > [ 368.267201] btrfs_run_delayed_refs+0x72/0x180 [btrfs] > [ 368.267212] btrfs_commit_transaction+0x40/0x8a0 [btrfs] And thread is running commit transaction. This looks pretty like this bug which should be fixed by the following patch: https://patchwork.kernel.org/patch/10654433/ If previous "btrfs check" shows no error, would you please try apply that patch and try again? Thanks, Qu
signature.asc
Description: OpenPGP digital signature