From: John Groves <[email protected]>

Fix memory_failure offset calculation for multi-range devices. The old
code subtracted ranges[0].range.start from the faulting PFN's physical
address, which produces an incorrect (inflated) logical offset when the
PFN falls in ranges[1] or beyond due to physical gaps between ranges.
Add fsdev_pfn_to_offset() to walk the range list and compute the correct
device-linear byte offset relative to ranges[0].start (the device data
start) -- the base the holder (xfs, famfs) maps from -- for both static
and dynamic devices.

V5 walked the pagemap's immutable pgmap->ranges[] instead, to avoid
reading the mutable dev_dax->ranges[] from this callback. That had a
different problem: it regressed static devices, where pgmap->ranges[0].start
can sit data_offset below the data start, so the reported offset came out
data_offset too high and the holder would act on the wrong blocks. For
dynamic devices the two arrays are identical, so pgmap->ranges[] only ever
helped the dynamic case while breaking the static one. Walk
dev_dax->ranges[] instead. (Richard Cheng spotted the static regression.)

Reading dev_dax->ranges[] here may race a concurrent krealloc() of the
range array via sysfs (mapping_store(), under dax_region_rwsem, which
this ->memory_failure callback does not hold). That exposure is
pre-existing -- the original single-range code read dev_dax->ranges[0]
locklessly as well -- so this patch does not make it worse; a proper fix
(locking or snapshotting) belongs in a separate change.

Fixes: d5406bd458b0a ("dax: add fsdev.c driver for fs-dax on character dax")

Reviewed-by: Dave Jiang <[email protected]>
Reviewed-by: Alison Schofield <[email protected]>
Signed-off-by: John Groves <[email protected]>
---
 drivers/dax/fsdev.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/dax/fsdev.c b/drivers/dax/fsdev.c
index 188b2526bee45..f315533b299e9 100644
--- a/drivers/dax/fsdev.c
+++ b/drivers/dax/fsdev.c
@@ -135,11 +135,26 @@ static void fsdev_clear_ops(void *data)
  * The core mm code in free_zone_device_folio() handles the wake_up_var()
  * directly for this memory type.
  */
+static u64 fsdev_pfn_to_offset(struct dev_dax *dev_dax, unsigned long pfn)
+{
+       phys_addr_t phys = PFN_PHYS(pfn);
+       u64 offset = 0;
+
+       for (int i = 0; i < dev_dax->nr_range; i++) {
+               struct range *range = &dev_dax->ranges[i].range;
+
+               if (phys >= range->start && phys <= range->end)
+                       return offset + (phys - range->start);
+               offset += range_len(range);
+       }
+       return -1ULL;
+}
+
 static int fsdev_pagemap_memory_failure(struct dev_pagemap *pgmap,
                unsigned long pfn, unsigned long nr_pages, int mf_flags)
 {
        struct dev_dax *dev_dax = pgmap->owner;
-       u64 offset = PFN_PHYS(pfn) - dev_dax->ranges[0].range.start;
+       u64 offset = fsdev_pfn_to_offset(dev_dax, pfn);
        u64 len = nr_pages << PAGE_SHIFT;
 
        return dax_holder_notify_failure(dev_dax->dax_dev, offset,
-- 
2.53.0


Reply via email to