From: John Groves <[email protected]>

This series applies bug fixes (mostly found via sashiko) to the dax/fsdev
series. It has been soaking in the famfs CI pipeline and 1) won't affect
anything that doesn't use drivers/dax/fsdev.c, and 2) doesn't affect any
known workloads -- although the bugs would have manifested when multi-range
DCD dax devices are a thing (soon-ish).

Most of the series is confined to drivers/dax/fsdev.c. Two patches touch
shared DAX core in drivers/dax/super.c: patch 7 reads holder_ops once in
dax_holder_notify_failure() to close a double-fetch NULL dereference, and
patch 8 reorders fs_put_dax() and adds a WARN_ON(). fs_put_dax() is used by
ext2/ext4/erofs/xfs, but only holder-passing callers (like XFS in-tree) will
see a behavior change, and only a new warning if they misuse it.

Changes since V4:

- New patch 7 (dax: read holder_ops once in dax_holder_notify_failure()):
  split the reader-side READ_ONCE() fix out of the fs_put_dax() patch and
  placed it first, so the fs_put_dax() patch's "a concurrent
  dax_holder_notify_failure() that sees NULL ops returns -EOPNOTSUPP
  cleanly" reasoning actually holds when it lands. dax_holder_notify_failure()
  read holder_ops twice without READ_ONCE(); a concurrent clear could make
  the NULL check pass while the indirect call dereferenced NULL. Carries
  Fixes: 8012b86608552 ("dax: introduce holder for dax_device"), the commit
  that introduced the unmarked double fetch. Suggested by Richard Cheng (and
  the Sashiko bot).
- Patch 2 (multi-range memory_failure offset): the ->memory_failure callback
  now walks the pagemap's own immutable range array (pgmap->ranges[]) rather
  than dev_dax->ranges[], which a concurrent sysfs mapping_store() can
  krealloc() under dax_region_rwsem while this callback holds no such lock.
  For dynamic devices the two arrays are identical, so the reported offset is
  unchanged for the multi-range case this patch targets. Suggested by Richard
  Cheng (and the Sashiko bot).
- Dropped the dax_dev_get()/dax_dev_find() patch (V4 patch 8) from this
  revision. There is no in-tree caller yet; it will be sent together with the
  famfs filesystem series that introduces the caller. (Per Richard Cheng /
  Sashiko.)
- Patch 8 (holder_ops race in fs_put_dax()): unchanged from V4 (renumbered
  from 7 to 8).
- Collected Reviewed-by from Dave Jiang on patches 4 and 6.

Changes since V3:

- Patch 4: Adopted Dave's suggested refactor -- factor out
  fsdev_acquire_pgmap() and defer the dev_dax->pgmap assignment until
  probe can no longer fail, replacing the goto-based cleanup. Did not
  carry Alison's V3 Reviewed-by due to the rewrite.
- Patch 5: Also remove the now write-only dev_dax->virt_addr field,
  per Dave's review.
- Patch 7: Fixed the WARN_ON() to tolerate holder_data == NULL, which
  legitimately occurs when kill_dax() clears it during device removal
  under a live holder (per Dave's review). Wrong-holder calls still
  warn.
- Patch 8: Kept the Fixes tag -- the exported symbol itself is the
  hazard; stable kernels carrying the export should want this fix.

Changes since V2:

* Patch 1 (comment fix): No change. Responded to Dave's question about
  the dropped precondition -- the new comment correctly covers both
  callers; fsdev_clear_folio_state() does not guarantee share==0 before
  calling, so the old precondition was no longer universally true.
* V2 patch 2 (three fixes): Split into three separate patches (patches
  2-4) per Dave's review.
* V2 patch 3 (two fixes): Split into two separate patches (patches 5-6)
  per Dave's review.
* V2 patch 4 (clamp direct_access / remove cached_size): Dropped.
  Dave's analysis correctly showed the claimed bug does not exist --
  dax_pgoff_to_phys() already enforces that the full requested size fits
  within a single range before returning, making the clamp a no-op in
  every reachable path.
* V2 patch 5 (holder_ops race): Use WRITE_ONCE() for the holder_ops
  store; add WARN_ON() on the cmpxchg result to catch wrong-holder and
  double-put API contract violations; fix the inline comment, which
  incorrectly claimed dax_holder_notify_failure() consults holder_ops
  only when holder_data is non-NULL.
* V2 patch 6 (dax_dev_find): Add dax_alive() check under dax_read_lock()
  after ilookup5() to prevent returning a device that is concurrently
  being torn down by kill_dax().
* V2 patch 7 (formatting cleanup): Drop incorrect Fixes: tag; add
  Dave's Reviewed-by.
* The series grows from 7 to 9 patches.

Changes since v1:
* Dropped modes from patch 6 to fs/fuse/famfs.c and 
  fs/famfs/famfs_inode.c, which are not upstream so it broke
  attempts to apply the series. Oops...
* Added patch 7, which addresses a previously-missed review comment
  from Jonathan - minor cleanup



John Groves (9):
  dax: fix misleading comment about share/index union in
    dax_folio_reset_order()
  dax/fsdev: fix multi-range offset in memory_failure handler
  dax/fsdev: clear vmemmap_shift when binding static pgmap
  dax/fsdev: don't leave a dangling dev_dax->pgmap on probe failure
  dax/fsdev: use __va(phys) for kaddr in direct_access
  dax/fsdev: fail probe on invalid pgmap offset
  dax: read holder_ops once in dax_holder_notify_failure()
  dax: fix holder_ops race in fs_put_dax()
  dax: fsdev.c minor formatting cleanup

 drivers/dax/dax-private.h |   2 -
 drivers/dax/fsdev.c       | 126 +++++++++++++++++++++++++-------------
 drivers/dax/super.c       |  53 ++++++++++++++--
 fs/dax.c                  |  12 ++--
 4 files changed, 136 insertions(+), 57 deletions(-)


base-commit: 4549871118cf616eecdd2d939f78e3b9e1dddc48
-- 
2.53.0



Reply via email to