On Sun, Jun 07, 2026 at 07:34:10PM +0800, John Groves wrote:
> From: John Groves <[email protected]>
> 
> Clear holder_ops before holder_data so that a concurrent fs_dax_get()
> cannot have its newly installed holder_ops overwritten. cmpxchg()
> provides release ordering on weakly-ordered architectures, ensuring the
> WRITE_ONCE(holder_ops, NULL) store is visible to any CPU that observes
> the holder_data release.
> 
> Add a WARN_ON() that fires only when the cmpxchg observes a non-NULL
> value that is not @holder, i.e. fs_put_dax() called by something that
> is not the current holder. That is an API contract violation; the
> WARN_ON() does not prevent the damage but makes the bug visible.
> 
> A NULL cmpxchg result is deliberately tolerated: kill_dax() clears
> holder_data while a holder is still attached when a device is removed
> out from under a mounted filesystem (after delivering MF_MEM_PRE_REMOVE).
> The holder's subsequent fs_put_dax() - e.g. xfs_free_buftarg() after a
> forced shutdown - then legitimately finds holder_data already NULL, so
> warning on that case would turn supported device removal into a splat
> (or a panic with panic_on_warn).
> 
> Also add a kerneldoc comment documenting that fs_put_dax() must only
> be called by the current holder.
> 
> Fixes: eec38f5d86d27 ("dax: Add fs_dax_get() func to prepare dax for fs-dax 
> usage")
> Signed-off-by: John Groves <[email protected]>
> ---
>  drivers/dax/super.c | 42 +++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 39 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index 25cf99dd9360b..96f778dcde50b 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -116,11 +116,47 @@ EXPORT_SYMBOL_GPL(fs_dax_get_by_bdev);
>  
>  #if IS_ENABLED(CONFIG_FS_DAX)
>  
> +/**
> + * fs_put_dax() - release holder ownership of a dax_device
> + * @dax_dev: dax device to release (may be NULL)
> + * @holder: the holder pointer previously passed to fs_dax_get() or
> + *          fs_dax_get_by_bdev(); must match exactly, as it is used
> + *          in a cmpxchg to atomically release ownership
> + *
> + * Must only be called by the current holder. Clears holder_ops before
> + * holder_data to avoid a race where a concurrent fs_dax_get() could have
> + * its newly installed holder_ops overwritten.
> + */
>  void fs_put_dax(struct dax_device *dax_dev, void *holder)
>  {
> -     if (dax_dev && holder &&
> -         cmpxchg(&dax_dev->holder_data, holder, NULL) == holder)
> -             dax_dev->holder_ops = NULL;
> +     if (dax_dev && holder) {
> +             void *prev;
> +
> +             /*
> +              * Clear holder_ops before releasing holder_data. A concurrent
> +              * dax_holder_notify_failure() that sees NULL ops returns
> +              * -EOPNOTSUPP cleanly. A concurrent fs_dax_get() that acquires
> +              * holder_data after the cmpxchg below is guaranteed to observe
> +              * holder_ops=NULL first (cmpxchg provides release ordering), so
> +              * its subsequent store of new ops will not be overwritten.
> +              */

This isn't guaranteed today. dax-holder_notify_failure() reads
dax_dev->holder_ops twice without READ_ONCE(). With your WRITE_ONCE()
racing in between, the second read "dax_dev->holder_ops->notify_failure()" can
return NULL and result in NULL deref, so the "see NULL cleanly" property the 
comment relies
on doesn't hold.

Or reading it once into a local would make it tru
"""
const struct dax_holder_operations *ops = READ_ONCE(dax_dev->holder_ops);

if (!ops)
        return -EOPNOTSUPP;
rc = ops->notify_failure(dax_dev, off, len, mf_flags);
"""

What do you think ?

--Richard


> +             WRITE_ONCE(dax_dev->holder_ops, NULL);
> +             prev = cmpxchg(&dax_dev->holder_data, holder, NULL);
> +
> +             /*
> +              * prev == holder: normal release.
> +              * prev == NULL:   already released by kill_dax() when the
> +              *                 device was removed under a live holder;
> +              *                 not a bug.
> +              * prev != holder (non-NULL): fs_put_dax() called by something
> +              *                 that is not the current holder; an API
> +              *                 contract violation. A lock would be needed
> +              *                 to guard against this, but we WARN_ON()
> +              *                 instead since violating the contract is
> +              *                 a bug.
> +              */
> +             WARN_ON(prev && prev != holder);
> +     }
>       put_dax(dax_dev);
>  }
>  EXPORT_SYMBOL_GPL(fs_put_dax);
> -- 
> 2.53.0
> 
> 
> 

Reply via email to