On Fri 23-05-25 14:37:49, Alistair Popple wrote: > Commit 6be3e21d25ca ("fs/dax: don't skip locked entries when scanning > entries") introduced a new function, wait_entry_unlocked_exclusive(), > which waits for the current entry to become unlocked without advancing > the XArray iterator state. > > Waiting for the entry to become unlocked requires dropping the XArray > lock. This requires calling xas_pause() prior to dropping the lock > which leaves the xas in a suitable state for the next iteration. However > this has the side-effect of advancing the xas state to the next index. > Normally this isn't an issue because xas_for_each() contains code to > detect this state and thus avoid advancing the index a second time on > the next loop iteration. > > However both callers of and wait_entry_unlocked_exclusive() itself > subsequently use the xas state to reload the entry. As xas_pause() > updated the state to the next index this will cause the current entry > which is being waited on to be skipped. This caused the following > warning to fire intermittently when running xftest generic/068 on an XFS > filesystem with FS DAX enabled: > > [ 35.067397] ------------[ cut here ]------------ > [ 35.068229] WARNING: CPU: 21 PID: 1640 at mm/truncate.c:89 > truncate_folio_batch_exceptionals+0xd8/0x1e0 > [ 35.069717] Modules linked in: nd_pmem dax_pmem nd_btt nd_e820 libnvdimm > [ 35.071006] CPU: 21 UID: 0 PID: 1640 Comm: fstest Not tainted 6.15.0-rc7+ > #77 PREEMPT(voluntary) > [ 35.072613] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/204 > [ 35.074845] RIP: 0010:truncate_folio_batch_exceptionals+0xd8/0x1e0 > [ 35.075962] Code: a1 00 00 00 f6 47 0d 20 0f 84 97 00 00 00 4c 63 e8 41 39 > c4 7f 0b eb 61 49 83 c5 01 45 39 ec 7e 58 42 f68 > [ 35.079522] RSP: 0018:ffffb04e426c7850 EFLAGS: 00010202 > [ 35.080359] RAX: 0000000000000000 RBX: ffff9d21e3481908 RCX: > ffffb04e426c77f4 > [ 35.081477] RDX: ffffb04e426c79e8 RSI: ffffb04e426c79e0 RDI: > ffff9d21e34816e8 > [ 35.082590] RBP: ffffb04e426c79e0 R08: 0000000000000001 R09: > 0000000000000003 > [ 35.083733] R10: 0000000000000000 R11: 822b53c0f7a49868 R12: > 000000000000001f > [ 35.084850] R13: 0000000000000000 R14: ffffb04e426c78e8 R15: > fffffffffffffffe > [ 35.085953] FS: 00007f9134c87740(0000) GS:ffff9d22abba0000(0000) > knlGS:0000000000000000 > [ 35.087346] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 35.088244] CR2: 00007f9134c86000 CR3: 000000040afff000 CR4: > 00000000000006f0 > [ 35.089354] Call Trace: > [ 35.089749] <TASK> > [ 35.090168] truncate_inode_pages_range+0xfc/0x4d0 > [ 35.091078] truncate_pagecache+0x47/0x60 > [ 35.091735] xfs_setattr_size+0xc7/0x3e0 > [ 35.092648] xfs_vn_setattr+0x1ea/0x270 > [ 35.093437] notify_change+0x1f4/0x510 > [ 35.094219] ? do_truncate+0x97/0xe0 > [ 35.094879] do_truncate+0x97/0xe0 > [ 35.095640] path_openat+0xabd/0xca0 > [ 35.096278] do_filp_open+0xd7/0x190 > [ 35.096860] do_sys_openat2+0x8a/0xe0 > [ 35.097459] __x64_sys_openat+0x6d/0xa0 > [ 35.098076] do_syscall_64+0xbb/0x1d0 > [ 35.098647] entry_SYSCALL_64_after_hwframe+0x77/0x7f > [ 35.099444] RIP: 0033:0x7f9134d81fc1 > [ 35.100033] Code: 75 57 89 f0 25 00 00 41 00 3d 00 00 41 00 74 49 80 3d 2a > 26 0e 00 00 74 6d 89 da 48 89 ee bf 9c ff ff ff5 > [ 35.102993] RSP: 002b:00007ffcd41e0d10 EFLAGS: 00000202 ORIG_RAX: > 0000000000000101 > [ 35.104263] RAX: ffffffffffffffda RBX: 0000000000000242 RCX: > 00007f9134d81fc1 > [ 35.105452] RDX: 0000000000000242 RSI: 00007ffcd41e1200 RDI: > 00000000ffffff9c > [ 35.106663] RBP: 00007ffcd41e1200 R08: 0000000000000000 R09: > 0000000000000064 > [ 35.107923] R10: 00000000000001a4 R11: 0000000000000202 R12: > 0000000000000066 > [ 35.109112] R13: 0000000000100000 R14: 0000000000100000 R15: > 0000000000000400 > [ 35.110357] </TASK> > [ 35.110769] irq event stamp: 8415587 > [ 35.111486] hardirqs last enabled at (8415599): [<ffffffff8d74b562>] > __up_console_sem+0x52/0x60 > [ 35.113067] hardirqs last disabled at (8415610): [<ffffffff8d74b547>] > __up_console_sem+0x37/0x60 > [ 35.114575] softirqs last enabled at (8415300): [<ffffffff8d6ac625>] > handle_softirqs+0x315/0x3f0 > [ 35.115933] softirqs last disabled at (8415291): [<ffffffff8d6ac811>] > __irq_exit_rcu+0xa1/0xc0 > [ 35.117316] ---[ end trace 0000000000000000 ]--- > > Fix this by using xas_reset() instead, which is equivalent in > implementation to xas_pause() but does not advance the XArray state. > > Fixes: 6be3e21d25ca ("fs/dax: don't skip locked entries when scanning > entries") > Signed-off-by: Alistair Popple <apop...@nvidia.com> > Cc: Dan Williams <dan.j.willi...@intel.com> > Cc: Alison Schofield <alison.schofi...@intel.com> > Cc: Matthew Wilcow (Oracle) <wi...@infradead.org> > Cc: Balbir Singh <balb...@nvidia.com> > Cc: "Darrick J. Wong" <djw...@kernel.org> > Cc: Dave Chinner <da...@fromorbit.com> > Cc: David Hildenbrand <da...@redhat.com> > Cc: Jan Kara <j...@suse.cz> > Cc: John Hubbard <jhubb...@nvidia.com> > Cc: Ted Ts'o <ty...@mit.edu> > Cc: Alexander Viro <v...@zeniv.linux.org.uk> > Cc: Christian Brauner <brau...@kernel.org>
Looks good to me. Feel free to add: Reviewed-by: Jan Kara <j...@suse.cz> Honza > --- > > Hi Andrew, > > Apologies for finding this so late in the cycle. This is a very > intermittent issue for me, and it seems it was only exposed by a recent > upgrade to my test machine/setup. The user visible impact is the same > as for the original commit this fixes. That is possible file data > corruption if a device has a FS DAX page pinned for DMA. > > So in other words it means my original fix was not 100% effective. > The issue that commit fixed has existed for a long time without being > reported, so not sure if this is worth trying to get into v6.15 or not. > > Either way I figured it would be best to send this ASAP, which means I > am still waiting for a complete xfstest run to complete (although the > failing test does now pass cleanly). > --- > fs/dax.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/dax.c b/fs/dax.c > index 676303419e9e..f8d8b1afd232 100644 > --- a/fs/dax.c > +++ b/fs/dax.c > @@ -257,7 +257,7 @@ static void *wait_entry_unlocked_exclusive(struct > xa_state *xas, void *entry) > wq = dax_entry_waitqueue(xas, entry, &ewait.key); > prepare_to_wait_exclusive(wq, &ewait.wait, > TASK_UNINTERRUPTIBLE); > - xas_pause(xas); > + xas_reset(xas); > xas_unlock_irq(xas); > schedule(); > finish_wait(wq, &ewait.wait); > -- > 2.47.2 > -- Jan Kara <j...@suse.com> SUSE Labs, CR