From: John Groves <[email protected]>
This series applies bug fixes (mostly found via sashiko) to the dax/fsdev
series. It has been soaking in the famfs CI pipeline and 1) won't affect
anything that doesn't use drivers/dax/fsdev.c, and 2) doesn't affect any
known workloads -- although the bugs would have manifested when multi-range
DCD dax devices are a thing (soon-ish).
Most of the series is confined to drivers/dax/fsdev.c. Two patches touch
shared DAX core in drivers/dax/super.c: patch 7 reads holder_ops once in
dax_holder_notify_failure() to close a double-fetch NULL dereference, and
patch 8 reorders fs_put_dax() and adds a WARN_ON(). fs_put_dax() is used by
ext2/ext4/erofs/xfs, but only holder-passing callers (like XFS in-tree) will
see a behavior change, and only a new warning if they misuse it.
Changes since V4:
- New patch 7 (dax: read holder_ops once in dax_holder_notify_failure()):
split the reader-side READ_ONCE() fix out of the fs_put_dax() patch and
placed it first, so the fs_put_dax() patch's "a concurrent
dax_holder_notify_failure() that sees NULL ops returns -EOPNOTSUPP
cleanly" reasoning actually holds when it lands. dax_holder_notify_failure()
read holder_ops twice without READ_ONCE(); a concurrent clear could make
the NULL check pass while the indirect call dereferenced NULL. Carries
Fixes: 8012b86608552 ("dax: introduce holder for dax_device"), the commit
that introduced the unmarked double fetch. Suggested by Richard Cheng (and
the Sashiko bot).
- Patch 2 (multi-range memory_failure offset): the ->memory_failure callback
now walks the pagemap's own immutable range array (pgmap->ranges[]) rather
than dev_dax->ranges[], which a concurrent sysfs mapping_store() can
krealloc() under dax_region_rwsem while this callback holds no such lock.
For dynamic devices the two arrays are identical, so the reported offset is
unchanged for the multi-range case this patch targets. Suggested by Richard
Cheng (and the Sashiko bot).
- Dropped the dax_dev_get()/dax_dev_find() patch (V4 patch 8) from this
revision. There is no in-tree caller yet; it will be sent together with the
famfs filesystem series that introduces the caller. (Per Richard Cheng /
Sashiko.)
- Patch 8 (holder_ops race in fs_put_dax()): unchanged from V4 (renumbered
from 7 to 8).
- Collected Reviewed-by from Dave Jiang on patches 4 and 6.
Changes since V3:
- Patch 4: Adopted Dave's suggested refactor -- factor out
fsdev_acquire_pgmap() and defer the dev_dax->pgmap assignment until
probe can no longer fail, replacing the goto-based cleanup. Did not
carry Alison's V3 Reviewed-by due to the rewrite.
- Patch 5: Also remove the now write-only dev_dax->virt_addr field,
per Dave's review.
- Patch 7: Fixed the WARN_ON() to tolerate holder_data == NULL, which
legitimately occurs when kill_dax() clears it during device removal
under a live holder (per Dave's review). Wrong-holder calls still
warn.
- Patch 8: Kept the Fixes tag -- the exported symbol itself is the
hazard; stable kernels carrying the export should want this fix.
Changes since V2:
* Patch 1 (comment fix): No change. Responded to Dave's question about
the dropped precondition -- the new comment correctly covers both
callers; fsdev_clear_folio_state() does not guarantee share==0 before
calling, so the old precondition was no longer universally true.
* V2 patch 2 (three fixes): Split into three separate patches (patches
2-4) per Dave's review.
* V2 patch 3 (two fixes): Split into two separate patches (patches 5-6)
per Dave's review.
* V2 patch 4 (clamp direct_access / remove cached_size): Dropped.
Dave's analysis correctly showed the claimed bug does not exist --
dax_pgoff_to_phys() already enforces that the full requested size fits
within a single range before returning, making the clamp a no-op in
every reachable path.
* V2 patch 5 (holder_ops race): Use WRITE_ONCE() for the holder_ops
store; add WARN_ON() on the cmpxchg result to catch wrong-holder and
double-put API contract violations; fix the inline comment, which
incorrectly claimed dax_holder_notify_failure() consults holder_ops
only when holder_data is non-NULL.
* V2 patch 6 (dax_dev_find): Add dax_alive() check under dax_read_lock()
after ilookup5() to prevent returning a device that is concurrently
being torn down by kill_dax().
* V2 patch 7 (formatting cleanup): Drop incorrect Fixes: tag; add
Dave's Reviewed-by.
* The series grows from 7 to 9 patches.
Changes since v1:
* Dropped modes from patch 6 to fs/fuse/famfs.c and
fs/famfs/famfs_inode.c, which are not upstream so it broke
attempts to apply the series. Oops...
* Added patch 7, which addresses a previously-missed review comment
from Jonathan - minor cleanup
John Groves (9):
dax: fix misleading comment about share/index union in
dax_folio_reset_order()
dax/fsdev: fix multi-range offset in memory_failure handler
dax/fsdev: clear vmemmap_shift when binding static pgmap
dax/fsdev: don't leave a dangling dev_dax->pgmap on probe failure
dax/fsdev: use __va(phys) for kaddr in direct_access
dax/fsdev: fail probe on invalid pgmap offset
dax: read holder_ops once in dax_holder_notify_failure()
dax: fix holder_ops race in fs_put_dax()
dax: fsdev.c minor formatting cleanup
drivers/dax/dax-private.h | 2 -
drivers/dax/fsdev.c | 126 +++++++++++++++++++++++++-------------
drivers/dax/super.c | 53 ++++++++++++++--
fs/dax.c | 12 ++--
4 files changed, 136 insertions(+), 57 deletions(-)
base-commit: 4549871118cf616eecdd2d939f78e3b9e1dddc48
--
2.53.0