> On 6 Oct 2017, at 01.36, Dave Chinner <[email protected]> wrote: > > On Thu, Oct 05, 2017 at 12:53:50PM +0200, Javier González wrote: >> Hi, >> >> lockdep is reporting a circular dependency when using XFS and pblk, >> which I am a bit confused about. >> >> This happens when XFS sends a number of nested reads and (at least) one >> of them hits partially pblk's cache. In this case, pblk will retrieve >> the cached lbas and form a new bio, which is submitted _synchronously_ >> to the media using struct completion. The original bio is then populated >> with the read data. >> >> What lockdep complains about, is that the unlocking operation in >> complete() has a circular dependency with ionode->i_rwsem when they both >> happen on the same core, which is different from the core that issued >> wait_for_completion_io_timeout() and is waiting for the partial read. >> However, AFAIU complete() happens in interrupt context, so this should >> not be a problem. >> >> The only thing I can think of is that XFS is making an assumption when >> submitting direct I/O that there will be no synchronous I/O coming from >> anybody underneath. >> >> Any thoughts? > > The new lockdep completion tracking that was included in 4.14 can > throw false positives because it doesn't (yet) have the capability > to understand the multiply layered completion contexts we have in > the storage stack. > > Peter, here's another for you to ponder.... > >> TRACE: >> [ 8557.938336] ====================================================== >> [ 8557.944513] WARNING: possible circular locking dependency detected >> [ 8557.950688] 4.14.0-rc2--509811f1f2df #726 Not tainted >> [ 8557.955739] ------------------------------------------------------ >> [ 8557.961919] rocksdb:bg11/19129 is trying to acquire lock: >> [ 8557.967319] (&inode->i_rwsem){++++}, at: [<ffffffffad5648d1>] >> xfs_ilock+0x1a1/0x230 >> [ 8557.975063] >> but now in release context of a crosslock acquired at the >> following: >> [ 8557.983921] ((complete)&wait#5){+.+.}, at: [<ffffffffad933d0a>] >> pblk_fill_partial_read_bio+0x22a/0x520 >> [ 8557.993318] >> which lock already depends on the new lock. >> >> [ 8558.001490] >> the existing dependency chain (in reverse order) is: >> [ 8558.008966] >> -> #1 ((complete)&wait#5){+.+.}: >> [ 8558.014715] __lock_acquire+0x1186/0x11d0 >> [ 8558.019245] lock_acquire+0xed/0x1e0 >> [ 8558.023337] wait_for_completion_io_timeout+0x65/0x180 >> [ 8558.028989] pblk_fill_partial_read_bio+0x22a/0x520 >> [ 8558.034375] pblk_submit_read+0x158/0x3a0 >> [ 8558.038900] pblk_make_rq+0xc1/0x110 >> [ 8558.042992] generic_make_request+0x11d/0x3c0 >> [ 8558.047862] submit_bio+0x73/0x140 >> [ 8558.051781] iomap_dio_actor+0x1f8/0x3a0 >> [ 8558.056216] iomap_apply+0xa1/0x110 >> [ 8558.060220] iomap_dio_rw+0x26f/0x3c0 >> [ 8558.064398] xfs_file_dio_aio_read+0x88/0x1c0 >> [ 8558.069269] xfs_file_read_iter+0xab/0xc0 >> [ 8558.073794] __vfs_read+0xef/0x160 >> [ 8558.077709] vfs_read+0xa8/0x150 >> [ 8558.081454] SyS_pread64+0x98/0xc0 >> [ 8558.085372] entry_SYSCALL_64_fastpath+0x23/0xc2 > > This is completely normal, blocked on a bio completion with a > filesystem lock held. More commonly we see this sort of thing > through through submit_bio_wait().
Yes. We do the same construct in pblk.
>
>> [ 8558.125636] Possible unsafe locking scenario by crosslock:
>>
>> [ 8558.132672] CPU0 CPU1
>> [ 8558.137196] ---- ----
>> [ 8558.141720] lock(&inode->i_rwsem);
>> [ 8558.145291] lock((complete)&wait#5);
>> [ 8558.149035] lock(&inode->i_rwsem);
>> [ 8558.155118] unlock((complete)&wait#5);
>> [ 8558.161550]
>> *** DEADLOCK ***
>>
>> [ 8558.167461] 4 locks held by rocksdb:bg11/19129:
>> [ 8558.171984] #0: (&inode->i_rwsem){++++}, at: [<ffffffffad5648d1>]
>> xfs_ilock+0x1a1/0x230
>> [ 8558.180148] #1: (rcu_read_lock){....}, at: [<ffffffffad72b8cc>]
>> __blk_mq_run_hw_queue+0xec/0x1d0
>> [ 8558.189091] #2: (&(&nvmeq->q_lock)->rlock){-.-.}, at:
>> [<ffffffffad9d438c>] nvme_queue_rq+0x14c/0x9a0
>> [ 8558.198384] #3: (&x->wait#17){-.-.}, at: [<ffffffffad16444d>]
>> complete+0x1d/0x60
>> [ 8558.205948]
>> stack backtrace:
>> [ 8558.210300] CPU: 6 PID: 19129 Comm: rocksdb:bg11 Not tainted
>> 4.14.0-rc2--509811f1f2df #726
>> [ 8558.218550] Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.0a
>> 06/23/2016
>> [ 8558.225675] Call Trace:
>> [ 8558.228123] dump_stack+0x85/0xc7
>> [ 8558.231438] print_circular_bug+0x226/0x330
>> [ 8558.235616] ? print_bfs_bug+0x40/0x40
>> [ 8558.239358] check_prev_add+0x432/0x840
>> [ 8558.243190] ? __lock_acquire+0x736/0x11d0
>> [ 8558.247282] lock_commit_crosslock+0x3d3/0x5d0
>> [ 8558.251718] ? lock_commit_crosslock+0x3d3/0x5d0
>> [ 8558.256328] complete+0x29/0x60
>> [ 8558.259469] pblk_end_io_sync+0x12/0x20
>> [ 8558.263297] nvm_end_io+0x2b/0x30
>> [ 8558.266607] nvme_nvm_end_io+0x2e/0x50
>
> And this is where lockdep gets confused. Up until this point, it's
> only ever seen the bio completion come from /outside/ the XFS
> i_rwsem context (i.e. async completion form interupts). With a
> synchronous completion like this, the completion has now been done
> from inside the i_rwsem context and that is what triggers lockdep.
>
> This is safe to do because even if we have a bio blocked on a
> completion queue, we've got all we way down to this complete() call
> from a *different bio context* inside a read locked i_rwsem. Hence
> there's no way this can be a deadlock even if the two processes are
> holding the same i_rwsem.
>
> Therefore: false positive.
Thanks for the explanation Dave. I could not see how waiting for a
struct complete under generic_make_request could in anyway interleave
with a FS lock, but it is good to have it confirmed.
Javier.
signature.asc
Description: Message signed with OpenPGP
