On Sun, 13 Apr 2025, Ira Weiny wrote:
Dynamic capacity device extents may be left in an accepted state on a
device due to an unexpected host crash. In this case it is expected
that the creation of a new region on top of a DC partition can read
those extents and surface them for continued use.
Once all endpoint decoders are part of a region and the region is being
realized, a read of the 'devices extent list' can reveal these
previously accepted extents.
CXL r3.1 specifies the mailbox call Get Dynamic Capacity Extent List for
this purpose. The call returns all the extents for all dynamic capacity
partitions. If the fabric manager is adding extents to any DCD
partition, the extent list for the recovered region may change. In this
case the query must retry. Upon retry the query could encounter extents
which were accepted on a previous list query. Adding such extents is
ignored without error because they are entirely within a previous
accepted extent. Instead warn on this case to allow for differentiating
bad devices from this normal condition.
Latch any errors to be bubbled up to ensure notification to the user
even if individual errors are rate limited or otherwise ignored.
The scan for existing extents races with the dax_cxl driver. This is
synchronized through the region device lock. Extents which are found
after the driver has loaded will surface through the normal notification
path while extents seen prior to the driver are read during driver load.
Based on an original patch by Navneet Singh.
Reviewed-by: Jonathan Cameron <[email protected]>
Reviewed-by: Fan Ni <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
---
Changes:
[0day: fix extent count in GetExtent input payload]
[iweiny: minor clean ups]
[iweiny: Adjust for partition arch]
---
drivers/cxl/core/core.h | 1 +
drivers/cxl/core/mbox.c | 109 ++++++++++++++++++++++++++++++++++++++++++++++
drivers/cxl/core/region.c | 25 +++++++++++
drivers/cxl/cxlmem.h | 21 +++++++++
4 files changed, 156 insertions(+)
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 027dd1504d77..e06a46fec217 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -22,6 +22,7 @@ cxled_to_mds(struct cxl_endpoint_decoder *cxled)
return container_of(cxlds, struct cxl_memdev_state, cxlds);
}
+int cxl_process_extent_list(struct cxl_endpoint_decoder *cxled);
int cxl_region_invalidate_memregion(struct cxl_region *cxlr);
#ifdef CONFIG_CXL_REGION
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index de01c6684530..8af3a4173b99 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1737,6 +1737,115 @@ int cxl_dev_dc_identify(struct cxl_mailbox *mbox,
}
EXPORT_SYMBOL_NS_GPL(cxl_dev_dc_identify, "CXL");
+/* Return -EAGAIN if the extent list changes while reading */
+static int __cxl_process_extent_list(struct cxl_endpoint_decoder *cxled)
+{
+ u32 current_index, total_read, total_expected, initial_gen_num;
+ struct cxl_memdev_state *mds = cxled_to_mds(cxled);
+ struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
+ struct device *dev = mds->cxlds.dev;
+ struct cxl_mbox_cmd mbox_cmd;
+ u32 max_extent_count;
+ int latched_rc = 0;
+ bool first = true;
+
+ struct cxl_mbox_get_extent_out *extents __free(kvfree) =
+ kvmalloc(cxl_mbox->payload_size, GFP_KERNEL);
+ if (!extents)
+ return -ENOMEM;
+
+ total_read = 0;
+ current_index = 0;
+ total_expected = 0;
+ max_extent_count = (cxl_mbox->payload_size - sizeof(*extents)) /
+ sizeof(struct cxl_extent);
+ do {
+ u32 nr_returned, current_total, current_gen_num;
+ struct cxl_mbox_get_extent_in get_extent;
+ int rc;
+
+ get_extent = (struct cxl_mbox_get_extent_in) {
+ .extent_cnt = cpu_to_le32(max(max_extent_count,
+ total_expected -
current_index)),
s/max/min().
+ .start_extent_index = cpu_to_le32(current_index),