Sadly, now I hit another one: [ 378.433842] ============================================= [ 378.433842] [ INFO: possible recursive locking detected ] [ 378.433845] 3.6.0-rc2-ceph-00143-g995fc06 #1 Not tainted [ 378.433845] --------------------------------------------- [ 378.433847] kworker/6:1/238 is trying to acquire lock: [ 378.433872] (sb_internal#2){.+.+..}, at: [<ffffffffa0042b74>] start_transaction+0x124/0x430 [btrfs] [ 378.433873] [ 378.433873] but task is already holding lock: [ 378.433890] (sb_internal#2){.+.+..}, at: [<ffffffffa0042590>] do_async_commit+0x0/0x80 [btrfs] [ 378.433891] [ 378.433891] other info that might help us debug this: [ 378.433892] Possible unsafe locking scenario: [ 378.433892] [ 378.433892] CPU0 [ 378.433893] ---- [ 378.433895] lock(sb_internal#2); [ 378.433897] lock(sb_internal#2); [ 378.433898] [ 378.433898] *** DEADLOCK *** [ 378.433898] [ 378.433898] May be due to missing lock nesting notation [ 378.433898] [ 378.433899] 3 locks held by kworker/6:1/238: [ 378.433906] #0: (events){.+.+.+}, at: [<ffffffff810717d6>] process_one_work+0x136/0x5f0 [ 378.433911] #1: ((&(&ac->work)->work)){+.+...}, at: [<ffffffff810717d6>] process_one_work+0x136/0x5f0 [ 378.433929] #2: (sb_internal#2){.+.+..}, at: [<ffffffffa0042590>] do_async_commit+0x0/0x80 [btrfs] [ 378.433932] [ 378.433932] stack backtrace: [ 378.433935] Pid: 238, comm: kworker/6:1 Not tainted 3.6.0-rc2-ceph-00143-g995fc06 #1 [ 378.433936] Call Trace: [ 378.433941] [<ffffffff810b2032>] __lock_acquire+0x1512/0x1b90 [ 378.433944] [<ffffffff810ada73>] ? __bfs+0x23/0x270 [ 378.433961] [<ffffffffa0042b74>] ? start_transaction+0x124/0x430 [btrfs] [ 378.433964] [<ffffffff810b2c82>] lock_acquire+0xa2/0x140 [ 378.433980] [<ffffffffa0042b74>] ? start_transaction+0x124/0x430 [btrfs] [ 378.433982] [<ffffffff810b3546>] ? mark_held_locks+0x86/0x140 [ 378.433987] [<ffffffff8117dac6>] __sb_start_write+0xc6/0x1b0 [ 378.434003] [<ffffffffa0042b74>] ? start_transaction+0x124/0x430 [btrfs] [ 378.434019] [<ffffffffa0042b74>] ? start_transaction+0x124/0x430 [btrfs] [ 378.434022] [<ffffffff81172e75>] ? kmem_cache_alloc+0xb5/0x160 [ 378.434024] [<ffffffff81172f9b>] ? kmem_cache_free+0x7b/0x160 [ 378.434042] [<ffffffffa0058b48>] ? free_extent_state+0x58/0xd0 [btrfs] [ 378.434058] [<ffffffffa0042b74>] start_transaction+0x124/0x430 [btrfs] [ 378.434076] [<ffffffffa005940d>] ? __set_extent_bit+0x37d/0x4e0 [btrfs] [ 378.434092] [<ffffffffa0042ed5>] btrfs_join_transaction+0x15/0x20 [btrfs] [ 378.434109] [<ffffffffa00496b7>] cow_file_range+0x87/0x4a0 [btrfs] [ 378.434114] [<ffffffff81634c6b>] ? _raw_spin_unlock+0x2b/0x40 [ 378.434131] [<ffffffffa004a80c>] run_delalloc_range+0x34c/0x370 [btrfs] [ 378.434149] [<ffffffffa005cbb0>] __extent_writepage+0x5e0/0x770 [btrfs] [ 378.434152] [<ffffffff810b3546>] ? mark_held_locks+0x86/0x140 [ 378.434155] [<ffffffff8112aa5e>] ? find_get_pages_tag+0x2e/0x1c0 [ 378.434174] [<ffffffffa005cffa>] extent_write_cache_pages.isra.25.constprop.39+0x2ba/0x410 [btrfs] [ 378.434187] [<ffffffffa002f7cc>] ? btrfs_run_delayed_refs+0xac/0x550 [btrfs] [ 378.434190] [<ffffffff81196117>] ? igrab+0x27/0x70 [ 378.434208] [<ffffffffa005d389>] extent_writepages+0x49/0x60 [btrfs] [ 378.434224] [<ffffffffa0046a90>] ? btrfs_submit_direct+0x670/0x670 [btrfs] [ 378.434240] [<ffffffffa00444c8>] btrfs_writepages+0x28/0x30 [btrfs] [ 378.434243] [<ffffffff81136443>] do_writepages+0x23/0x40 [ 378.434247] [<ffffffff8112b839>] __filemap_fdatawrite_range+0x59/0x60 [ 378.434249] [<ffffffff8112c6ac>] filemap_flush+0x1c/0x20 [ 378.434266] [<ffffffffa0050b1e>] btrfs_start_delalloc_inodes+0xbe/0x200 [btrfs] [ 378.434270] [<ffffffff8132babd>] ? do_raw_spin_unlock+0x5d/0xb0 [ 378.434286] [<ffffffffa0041ebd>] btrfs_commit_transaction+0x44d/0xb20 [btrfs] [ 378.434290] [<ffffffff81079850>] ? __init_waitqueue_head+0x60/0x60 [ 378.434293] [<ffffffff810717d6>] ? process_one_work+0x136/0x5f0 [ 378.434308] [<ffffffffa00425f1>] do_async_commit+0x61/0x80 [btrfs] [ 378.434324] [<ffffffffa0042590>] ? btrfs_commit_transaction+0xb20/0xb20 [btrfs] [ 378.434327] [<ffffffff81071840>] process_one_work+0x1a0/0x5f0 [ 378.434330] [<ffffffff810717d6>] ? process_one_work+0x136/0x5f0 [ 378.434346] [<ffffffffa0042590>] ? btrfs_commit_transaction+0xb20/0xb20 [btrfs] [ 378.434350] [<ffffffff8107360d>] worker_thread+0x18d/0x4c0 [ 378.434354] [<ffffffff81073480>] ? manage_workers.isra.22+0x2c0/0x2c0 [ 378.434356] [<ffffffff810791ee>] kthread+0xae/0xc0 [ 378.434359] [<ffffffff810b379d>] ? trace_hardirqs_on+0xd/0x10 [ 378.434363] [<ffffffff8163e744>] kernel_thread_helper+0x4/0x10 [ 378.434366] [<ffffffff81635430>] ? retint_restore_args+0x13/0x13 [ 378.434368] [<ffffffff81079140>] ? flush_kthread_work+0x1a0/0x1a0 [ 378.434371] [<ffffffff8163e740>] ? gs_change+0x13/0x13
On Fri, 24 Aug 2012, Sage Weil wrote: > The freeze rwsem is taken by sb_start_intwrite() and dropped during the > commit_ or end_transaction(). In the async case, that happens in a worker > thread. Tell lockdep the calling thread is releasing ownership of the > rwsem and the async thread is picking it up. > > Josef and I worked out a more complicated solution that made the async > commit thread join and potentially get a later transaction, but it failed > my initial smoke test and Dave pointed out that XFS avoids the issue by > just telling lockdep what's up. This is much simpler. XFS does the same > thing in fs/xfs/xfs_aops.c. > > Signed-off-by: Sage Weil <s...@inktank.com> > --- > fs/btrfs/transaction.c | 16 ++++++++++++++++ > 1 files changed, 16 insertions(+), 0 deletions(-) > > diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c > index 17be3de..efc41a5 100644 > --- a/fs/btrfs/transaction.c > +++ b/fs/btrfs/transaction.c > @@ -1228,6 +1228,14 @@ static void do_async_commit(struct work_struct *work) > struct btrfs_async_commit *ac = > container_of(work, struct btrfs_async_commit, work.work); > > + /* > + * We've got freeze protection passed with the transaction. > + * Tell lockdep about it. > + */ > + rwsem_acquire_read( > + &ac->root->fs_info->sb->s_writers.lock_map[SB_FREEZE_FS-1], > + 0, 1, _THIS_IP_); > + > btrfs_commit_transaction(ac->newtrans, ac->root); > kfree(ac); > } > @@ -1257,6 +1265,14 @@ int btrfs_commit_transaction_async(struct > btrfs_trans_handle *trans, > atomic_inc(&cur_trans->use_count); > > btrfs_end_transaction(trans, root); > + > + /* > + * Tell lockdep we've released the freeze rwsem, since the > + * async commit thread will be the one to unlock it. > + */ > + rwsem_release(&root->fs_info->sb->s_writers.lock_map[SB_FREEZE_FS-1], > + 1, _THIS_IP_); > + > schedule_delayed_work(&ac->work, 0); > > /* wait for transaction to start and unblock */ > -- > 1.7.9 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html