On Fri 23-05-25 14:37:49, Alistair Popple wrote:
> Commit 6be3e21d25ca ("fs/dax: don't skip locked entries when scanning
> entries") introduced a new function, wait_entry_unlocked_exclusive(),
> which waits for the current entry to become unlocked without advancing
> the XArray iterator state.
> 
> Waiting for the entry to become unlocked requires dropping the XArray
> lock. This requires calling xas_pause() prior to dropping the lock
> which leaves the xas in a suitable state for the next iteration. However
> this has the side-effect of advancing the xas state to the next index.
> Normally this isn't an issue because xas_for_each() contains code to
> detect this state and thus avoid advancing the index a second time on
> the next loop iteration.
> 
> However both callers of and wait_entry_unlocked_exclusive() itself
> subsequently use the xas state to reload the entry. As xas_pause()
> updated the state to the next index this will cause the current entry
> which is being waited on to be skipped. This caused the following
> warning to fire intermittently when running xftest generic/068 on an XFS
> filesystem with FS DAX enabled:
> 
> [   35.067397] ------------[ cut here ]------------
> [   35.068229] WARNING: CPU: 21 PID: 1640 at mm/truncate.c:89 
> truncate_folio_batch_exceptionals+0xd8/0x1e0
> [   35.069717] Modules linked in: nd_pmem dax_pmem nd_btt nd_e820 libnvdimm
> [   35.071006] CPU: 21 UID: 0 PID: 1640 Comm: fstest Not tainted 6.15.0-rc7+ 
> #77 PREEMPT(voluntary)
> [   35.072613] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/204
> [   35.074845] RIP: 0010:truncate_folio_batch_exceptionals+0xd8/0x1e0
> [   35.075962] Code: a1 00 00 00 f6 47 0d 20 0f 84 97 00 00 00 4c 63 e8 41 39 
> c4 7f 0b eb 61 49 83 c5 01 45 39 ec 7e 58 42 f68
> [   35.079522] RSP: 0018:ffffb04e426c7850 EFLAGS: 00010202
> [   35.080359] RAX: 0000000000000000 RBX: ffff9d21e3481908 RCX: 
> ffffb04e426c77f4
> [   35.081477] RDX: ffffb04e426c79e8 RSI: ffffb04e426c79e0 RDI: 
> ffff9d21e34816e8
> [   35.082590] RBP: ffffb04e426c79e0 R08: 0000000000000001 R09: 
> 0000000000000003
> [   35.083733] R10: 0000000000000000 R11: 822b53c0f7a49868 R12: 
> 000000000000001f
> [   35.084850] R13: 0000000000000000 R14: ffffb04e426c78e8 R15: 
> fffffffffffffffe
> [   35.085953] FS:  00007f9134c87740(0000) GS:ffff9d22abba0000(0000) 
> knlGS:0000000000000000
> [   35.087346] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   35.088244] CR2: 00007f9134c86000 CR3: 000000040afff000 CR4: 
> 00000000000006f0
> [   35.089354] Call Trace:
> [   35.089749]  <TASK>
> [   35.090168]  truncate_inode_pages_range+0xfc/0x4d0
> [   35.091078]  truncate_pagecache+0x47/0x60
> [   35.091735]  xfs_setattr_size+0xc7/0x3e0
> [   35.092648]  xfs_vn_setattr+0x1ea/0x270
> [   35.093437]  notify_change+0x1f4/0x510
> [   35.094219]  ? do_truncate+0x97/0xe0
> [   35.094879]  do_truncate+0x97/0xe0
> [   35.095640]  path_openat+0xabd/0xca0
> [   35.096278]  do_filp_open+0xd7/0x190
> [   35.096860]  do_sys_openat2+0x8a/0xe0
> [   35.097459]  __x64_sys_openat+0x6d/0xa0
> [   35.098076]  do_syscall_64+0xbb/0x1d0
> [   35.098647]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> [   35.099444] RIP: 0033:0x7f9134d81fc1
> [   35.100033] Code: 75 57 89 f0 25 00 00 41 00 3d 00 00 41 00 74 49 80 3d 2a 
> 26 0e 00 00 74 6d 89 da 48 89 ee bf 9c ff ff ff5
> [   35.102993] RSP: 002b:00007ffcd41e0d10 EFLAGS: 00000202 ORIG_RAX: 
> 0000000000000101
> [   35.104263] RAX: ffffffffffffffda RBX: 0000000000000242 RCX: 
> 00007f9134d81fc1
> [   35.105452] RDX: 0000000000000242 RSI: 00007ffcd41e1200 RDI: 
> 00000000ffffff9c
> [   35.106663] RBP: 00007ffcd41e1200 R08: 0000000000000000 R09: 
> 0000000000000064
> [   35.107923] R10: 00000000000001a4 R11: 0000000000000202 R12: 
> 0000000000000066
> [   35.109112] R13: 0000000000100000 R14: 0000000000100000 R15: 
> 0000000000000400
> [   35.110357]  </TASK>
> [   35.110769] irq event stamp: 8415587
> [   35.111486] hardirqs last  enabled at (8415599): [<ffffffff8d74b562>] 
> __up_console_sem+0x52/0x60
> [   35.113067] hardirqs last disabled at (8415610): [<ffffffff8d74b547>] 
> __up_console_sem+0x37/0x60
> [   35.114575] softirqs last  enabled at (8415300): [<ffffffff8d6ac625>] 
> handle_softirqs+0x315/0x3f0
> [   35.115933] softirqs last disabled at (8415291): [<ffffffff8d6ac811>] 
> __irq_exit_rcu+0xa1/0xc0
> [   35.117316] ---[ end trace 0000000000000000 ]---
> 
> Fix this by using xas_reset() instead, which is equivalent in
> implementation to xas_pause() but does not advance the XArray state.
> 
> Fixes: 6be3e21d25ca ("fs/dax: don't skip locked entries when scanning 
> entries")
> Signed-off-by: Alistair Popple <apop...@nvidia.com>
> Cc: Dan Williams <dan.j.willi...@intel.com>
> Cc: Alison Schofield <alison.schofi...@intel.com>
> Cc: Matthew Wilcow (Oracle) <wi...@infradead.org>
> Cc: Balbir Singh <balb...@nvidia.com>
> Cc: "Darrick J. Wong" <djw...@kernel.org>
> Cc: Dave Chinner <da...@fromorbit.com>
> Cc: David Hildenbrand <da...@redhat.com>
> Cc: Jan Kara <j...@suse.cz>
> Cc: John Hubbard <jhubb...@nvidia.com>
> Cc: Ted Ts'o <ty...@mit.edu>
> Cc: Alexander Viro <v...@zeniv.linux.org.uk>
> Cc: Christian Brauner <brau...@kernel.org>

Looks good to me. Feel free to add:

Reviewed-by: Jan Kara <j...@suse.cz>

                                                                Honza

> ---
> 
> Hi Andrew,
> 
> Apologies for finding this so late in the cycle. This is a very
> intermittent issue for me, and it seems it was only exposed by a recent
> upgrade to my test machine/setup. The user visible impact is the same
> as for the original commit this fixes. That is possible file data
> corruption if a device has a FS DAX page pinned for DMA.
> 
> So in other words it means my original fix was not 100% effective.
> The issue that commit fixed has existed for a long time without being
> reported, so not sure if this is worth trying to get into v6.15 or not.
> 
> Either way I figured it would be best to send this ASAP, which means I
> am still waiting for a complete xfstest run to complete (although the
> failing test does now pass cleanly).
> ---
>  fs/dax.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 676303419e9e..f8d8b1afd232 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -257,7 +257,7 @@ static void *wait_entry_unlocked_exclusive(struct 
> xa_state *xas, void *entry)
>               wq = dax_entry_waitqueue(xas, entry, &ewait.key);
>               prepare_to_wait_exclusive(wq, &ewait.wait,
>                                       TASK_UNINTERRUPTIBLE);
> -             xas_pause(xas);
> +             xas_reset(xas);
>               xas_unlock_irq(xas);
>               schedule();
>               finish_wait(wq, &ewait.wait);
> -- 
> 2.47.2
> 
-- 
Jan Kara <j...@suse.com>
SUSE Labs, CR

Reply via email to