On 12/3/2025 4:50 PM, [email protected] wrote:
Smita Koralahalli wrote:
If CXL regions do not fully cover a Soft Reserved span, HMEM takes
ownership. Tear down overlapping CXL regions before allowing HMEM to
register and online the memory.
Add cxl_region_teardown() to walk CXL regions overlapping a span and
unregister them via devm_release_action() and unregister_region().
Force the region state back to CXL_CONFIG_ACTIVE before unregistering to
prevent the teardown path from resetting decoders HMEM still relies on
to create its dax and online memory.
Co-developed-by: Alison Schofield <[email protected]>
Signed-off-by: Alison Schofield <[email protected]>
Signed-off-by: Smita Koralahalli <[email protected]>
---
drivers/cxl/core/region.c | 38 ++++++++++++++++++++++++++++++++++++++
drivers/cxl/cxl.h | 5 +++++
drivers/dax/hmem/hmem.c | 4 +++-
3 files changed, 46 insertions(+), 1 deletion(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 38e7ec6a087b..266b24028df0 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -3784,6 +3784,44 @@ struct cxl_range_ctx {
bool found;
};
+static int cxl_region_teardown_cb(struct device *dev, void *data)
+{
+ struct cxl_range_ctx *ctx = data;
+ struct cxl_root_decoder *cxlrd;
+ struct cxl_region_params *p;
+ struct cxl_region *cxlr;
+ struct cxl_port *port;
+
+ cxlr = cxlr_overlapping_range(dev, ctx->start, ctx->end);
+ if (!cxlr)
+ return 0;
+
+ cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
+ port = cxlrd_to_port(cxlrd);
+ p = &cxlr->params;
+
+ /* Force the region state back to CXL_CONFIG_ACTIVE so that
Minor, and moot given the follow on comments below, but please keep
consistent comment-style and lead with a /*, i.e.:
/*
* Force the region...
+ * unregister_region() does not run the full decoder reset path
+ * which would invalidate the decoder programming that HMEM
+ * relies on to create its DAX device and online the underlying
+ * memory.
+ */
+ scoped_guard(rwsem_write, &cxl_rwsem.region)
+ p->state = min(p->state, CXL_CONFIG_ACTIVE);
I think the thickness of the above comment belies that this is too much
of a layering violation and likely to cause problems. For minimizing the
mental load of analyzing future bug reports, I want all regions gone
when any handshake with the platform firmware and dax-hmem occurs. When
that happens it may mean destroying regions that were dynamically
created while waiting the wait_for_initial_probe() to timeout, who
knows. The simple policy is "CXL subsystem understands everything, or
touches nothing."
For this reset determination, what I think makes more sense, and is
generally useful for shutting down CXL even outside of the hmem deferral
trickery, is to always record whether decoders were idle or not at the
time of region creation. In fact we already have that flag, it is called
CXL_REGION_F_AUTO.
If CXL_REGION_F_AUTO is still set at detach_target() time, it means that
we are giving up on auto-assembly and leaving the decoders alone.
If the administrator actually wants to destroy and reclaim that
physical address space then they need to forcefully de-commit that
auto-assembled region via the @commit sysfs attribute. So that means
commit_store() needs to clear CXL_REGION_F_AUTO to get the decoder reset
to happen.
[..]
void cxl_endpoint_parse_cdat(struct cxl_port *port);
diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index b9312e0f2e62..7d874ee169ac 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -158,8 +158,10 @@ static int handle_deferred_cxl(struct device *host, int
target_nid,
if (cxl_regions_fully_map(res->start, res->end)) {
dax_cxl_mode = DAX_CXL_MODE_DROP;
cxl_register_dax(res->start, res->end);
- } else
+ } else {
dax_cxl_mode = DAX_CXL_MODE_REGISTER;
+ cxl_region_teardown(res->start, res->end);
+ }
Like I alluded to above, I am not on board with making a range-by range
decision on teardown. The check for "all clear" vs "abort" should be a
global event before proceeding with either allowing cxl_region instances
to attach or all of them get destroyed. Recall that if
cxl_dax_region_probe() is globally rejecting all cxl_dax_region devices
until dax_cxl_mode moves to DAX_CXL_MODE_DROP then it keeps a consistent
behavior of all regions attach or none attach.
Thanks. I will restructure this and rely on CXL_REGION_F_AUTO semantics
for teardown.
I wanted to check one nuance to make sure Im applying this correctly.
Consider a case like:
Soft Reserved spans (HMAT-visible):
SR1: 0x085000–0x284fff
SR2: 0x285000–0x484fff
SR3: 0x485000–0x684fff
CXL regions:
R1: 0x085000–0x274fff
R2: 0x285000–0x484fff
R3: 0x485000–0x684fff
Here SR1 is not fully contained by any committed CXL region, even though
SR2 and SR3 are. In this scenario, my understanding is that the global
check should fail, and we abort CXL ownership entirely (tear down all
auto-assembled regions) and proceed with pure HMEM, rather than allowing
partial CXL ownership. Is that correct?
Related to that, the flow should be something like:
Inside process_defer_work()
process_defer_work()
wait_for_device_probe();
if (cxl_contains_soft_reserve())
dax_cxl_mode = DROP
rescan cxl-bus (bus_rescan_devices(&cxl_bus_type);)
else
dax_cxl_mode = REGISTER
teardown all regions.
walk_hmem_resources(&pdev->dev, hmem_register_device);
(Here we walk all hmem resources second time)..
and as you mentioned
cxl_contains_soft_reserve()
for_each_cxl_intersecting_hmem_resource()
(Here we walk all hmem resources first time)..
found = false
for_each_region()
if (resource_contains(cxl_region_resource, hmem_resource))
found = true
if (!found)
return false
return true
Thanks
Smita