On 26/06/08 06:52PM, Richard Cheng wrote: > On Sun, Jun 07, 2026 at 07:34:10PM +0800, John Groves wrote: > > From: John Groves <[email protected]> > > > > Clear holder_ops before holder_data so that a concurrent fs_dax_get() > > cannot have its newly installed holder_ops overwritten. cmpxchg() > > provides release ordering on weakly-ordered architectures, ensuring the > > WRITE_ONCE(holder_ops, NULL) store is visible to any CPU that observes > > the holder_data release. > > > > Add a WARN_ON() that fires only when the cmpxchg observes a non-NULL > > value that is not @holder, i.e. fs_put_dax() called by something that > > is not the current holder. That is an API contract violation; the > > WARN_ON() does not prevent the damage but makes the bug visible. > > > > A NULL cmpxchg result is deliberately tolerated: kill_dax() clears > > holder_data while a holder is still attached when a device is removed > > out from under a mounted filesystem (after delivering MF_MEM_PRE_REMOVE). > > The holder's subsequent fs_put_dax() - e.g. xfs_free_buftarg() after a > > forced shutdown - then legitimately finds holder_data already NULL, so > > warning on that case would turn supported device removal into a splat > > (or a panic with panic_on_warn). > > > > Also add a kerneldoc comment documenting that fs_put_dax() must only > > be called by the current holder. > > > > Fixes: eec38f5d86d27 ("dax: Add fs_dax_get() func to prepare dax for fs-dax > > usage") > > Signed-off-by: John Groves <[email protected]> > > --- > > drivers/dax/super.c | 42 +++++++++++++++++++++++++++++++++++++++--- > > 1 file changed, 39 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/dax/super.c b/drivers/dax/super.c > > index 25cf99dd9360b..96f778dcde50b 100644 > > --- a/drivers/dax/super.c > > +++ b/drivers/dax/super.c > > @@ -116,11 +116,47 @@ EXPORT_SYMBOL_GPL(fs_dax_get_by_bdev); > > > > #if IS_ENABLED(CONFIG_FS_DAX) > > > > +/** > > + * fs_put_dax() - release holder ownership of a dax_device > > + * @dax_dev: dax device to release (may be NULL) > > + * @holder: the holder pointer previously passed to fs_dax_get() or > > + * fs_dax_get_by_bdev(); must match exactly, as it is used > > + * in a cmpxchg to atomically release ownership > > + * > > + * Must only be called by the current holder. Clears holder_ops before > > + * holder_data to avoid a race where a concurrent fs_dax_get() could have > > + * its newly installed holder_ops overwritten. > > + */ > > void fs_put_dax(struct dax_device *dax_dev, void *holder) > > { > > - if (dax_dev && holder && > > - cmpxchg(&dax_dev->holder_data, holder, NULL) == holder) > > - dax_dev->holder_ops = NULL; > > + if (dax_dev && holder) { > > + void *prev; > > + > > + /* > > + * Clear holder_ops before releasing holder_data. A concurrent > > + * dax_holder_notify_failure() that sees NULL ops returns > > + * -EOPNOTSUPP cleanly. A concurrent fs_dax_get() that acquires > > + * holder_data after the cmpxchg below is guaranteed to observe > > + * holder_ops=NULL first (cmpxchg provides release ordering), so > > + * its subsequent store of new ops will not be overwritten. > > + */ > > This isn't guaranteed today. dax-holder_notify_failure() reads > dax_dev->holder_ops twice without READ_ONCE(). With your WRITE_ONCE() > racing in between, the second read "dax_dev->holder_ops->notify_failure()" can > return NULL and result in NULL deref, so the "see NULL cleanly" property the > comment relies > on doesn't hold. > > Or reading it once into a local would make it tru > """ > const struct dax_holder_operations *ops = READ_ONCE(dax_dev->holder_ops); > > if (!ops) > return -EOPNOTSUPP; > rc = ops->notify_failure(dax_dev, off, len, mf_flags); > """ > > What do you think ?
Another good catch. Adding a fix to dax_holder_notify_failure(), to get the ops via READ_ONCE(). Thanks, John <snip>

