Alistair Popple wrote: > Commit 6be3e21d25ca ("fs/dax: don't skip locked entries when scanning > entries") introduced a new function, wait_entry_unlocked_exclusive(), > which waits for the current entry to become unlocked without advancing > the XArray iterator state. > > Waiting for the entry to become unlocked requires dropping the XArray > lock. This requires calling xas_pause() prior to dropping the lock > which leaves the xas in a suitable state for the next iteration. However > this has the side-effect of advancing the xas state to the next index. > Normally this isn't an issue because xas_for_each() contains code to > detect this state and thus avoid advancing the index a second time on > the next loop iteration. > > However both callers of and wait_entry_unlocked_exclusive() itself > subsequently use the xas state to reload the entry. As xas_pause() > updated the state to the next index this will cause the current entry > which is being waited on to be skipped. This caused the following > warning to fire intermittently when running xftest generic/068 on an XFS > filesystem with FS DAX enabled: > > [ 35.067397] ------------[ cut here ]------------ > [ 35.068229] WARNING: CPU: 21 PID: 1640 at mm/truncate.c:89 > truncate_folio_batch_exceptionals+0xd8/0x1e0 [..] > > Fix this by using xas_reset() instead, which is equivalent in > implementation to xas_pause() but does not advance the XArray state. > > Fixes: 6be3e21d25ca ("fs/dax: don't skip locked entries when scanning > entries") > Signed-off-by: Alistair Popple <apop...@nvidia.com> [..] > > Hi Andrew, > > Apologies for finding this so late in the cycle. This is a very > intermittent issue for me, and it seems it was only exposed by a recent > upgrade to my test machine/setup. The user visible impact is the same > as for the original commit this fixes. That is possible file data > corruption if a device has a FS DAX page pinned for DMA. > > So in other words it means my original fix was not 100% effective. > The issue that commit fixed has existed for a long time without being > reported, so not sure if this is worth trying to get into v6.15 or not. > > Either way I figured it would be best to send this ASAP, which means I > am still waiting for a complete xfstest run to complete (although the > failing test does now pass cleanly). > --- > fs/dax.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/dax.c b/fs/dax.c > index 676303419e9e..f8d8b1afd232 100644 > --- a/fs/dax.c > +++ b/fs/dax.c > @@ -257,7 +257,7 @@ static void *wait_entry_unlocked_exclusive(struct > xa_state *xas, void *entry) > wq = dax_entry_waitqueue(xas, entry, &ewait.key); > prepare_to_wait_exclusive(wq, &ewait.wait, > TASK_UNINTERRUPTIBLE); > - xas_pause(xas); > + xas_reset(xas); > xas_unlock_irq(xas); > schedule(); > finish_wait(wq, &ewait.wait);
This looks super-subtle, but so did the original fix commit 6be3e21d25ca ("fs/dax: don't skip locked entries when scanning entries"). The resolution is the same to make sure the xarray state does not mistakenly advance when the lock is dropped. You can add: Reviewed-by: Dan Williams <dan.j.willi...@intel.com>