Alison Schofield wrote: > On Wed, Mar 11, 2026 at 02:37:46PM -0700, Dan Williams wrote: > > Smita Koralahalli wrote: > > > __cxl_decoder_detach() currently resets decoder programming whenever a > > > region is detached if cxl_config_state is beyond CXL_CONFIG_ACTIVE. For > > > autodiscovered regions, this can incorrectly tear down decoder state > > > that may be relied upon by other consumers or by subsequent ownership > > > decisions. > > > > > > Skip cxl_region_decode_reset() during detach when CXL_REGION_F_AUTO is > > > set. > > > > > > Signed-off-by: Smita Koralahalli <[email protected]> > > > Reviewed-by: Jonathan Cameron <[email protected]> > > > Reviewed-by: Dave Jiang <[email protected]> > > > Reviewed-by: Alejandro Lucero <[email protected]> > > > --- > > > drivers/cxl/core/region.c | 4 +++- > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c > > > index ae899f68551f..45ee598daf95 100644 > > > --- a/drivers/cxl/core/region.c > > > +++ b/drivers/cxl/core/region.c > > > @@ -2178,7 +2178,9 @@ __cxl_decoder_detach(struct cxl_region *cxlr, > > > cxled->part = -1; > > > > > > if (p->state > CXL_CONFIG_ACTIVE) { > > > - cxl_region_decode_reset(cxlr, p->interleave_ways); > > > + if (!test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) > > > + cxl_region_decode_reset(cxlr, p->interleave_ways); > > > + > > > p->state = CXL_CONFIG_ACTIVE; > > > > Hi Dan, > > > tl;dr: I do not think we need this, I do think we need to clarify to > > users what enable/disable and/or hot remove violence is handled and not > > handled by the CXL core. > > I'm chiming in here because although this patch is no longer needed for > this series, it has become a dependency for the Type 2 series.
Like I replied to Alejandro it is not a dependency for the type-2 series [1]. It *is* a fix for the issue reported by PJ, but it can go in independent of the base type-2 work as a standalone capability. [1]: http://lore.kernel.org/[email protected] > So this > follow-up focuses on the hot-remove, endpoint-detach case where > preserving decoders across detach is still needed for later recovery. > > Some inline responses to you, and then a diff is appended for a > direction check. > > > So this looks deceptively simple, but I think it is incomplete or at > > least adds to the current confusion. A couple points to consider: > > > > 1/ There is no corresponding clear_bit(CXL_REGION_F_AUTO, ...) anywhere in > > the driver. Yes, admin can still force cxl_region_decode_reset() via > > commit_store() path, but admin can not force > > cxl_region_teardown_targets() in the __cxl_decoder_detach() path. I do > > not like that this causes us to end up with 2 separate considerations > > for when __cxl_decoder_detach() skips cleanup actions > > (cxl_region_teardown_targets() and cxl_region_decode_reset()). See > > below, I think the cxl_region_teardown_targets() check is probably > > bogus. > > Rather than repurposing CXL_REGION_F_AUTO, this splits decode-reset policy > from AUTO. A new region-scoped CXL_REGION_F_PRESERVE_DECODE flag is introduced > and cleared on explicit decommit in commit_store(). AUTO remains > origin/assembly > state. Just like the decoder LOCK bit the preservation setting is a decoder property, not a region property. Region auto-assembly is then just an automatic way to set that decoder policy. So, no, I would not expect a new region flag for this policy. > This does still leave two cleanup decisions: > 1) decode reset (now keyed off PRESERVE_DECODE) > 2) target teardown (still using existing AUTO behavior) > > No change to cxl_region_teardown_targets() in this step. Turns out that cxl_region_teardown_targets() never needed to consider the CXL_F_REGION_AUTO flag. > > At a minimum I think commit_store() should clear CXL_REGION_F_AUTO on > > decommit such that cleaning up decoders and targets later proceeds as > > expected. > > This point is addressed by clearing CXL_REGION_F_PRESERVE_DECODE instead. > Explicit decommit is treated as destructive and disables decode preservation > before unbind/reset. > > > > > 2/ The hard part about CXL region cleanup is that it needs to be prepared > > for: > > > > a/ user manually removes the region via sysfs > > > > b/ user manually disables cxl_port, cxl_mem, or cxl_acpi causing the > > endpoint port to be removed > > > > c/ user physically removes the memdev causing the endpoint port to be > > removed (CXL core can not tell the difference with 2b/ it just sees > > cxl_mem_driver::remove() operation invocation) > > > > d/ setup action fails and region setup is unwound > > Agreed. This change targets 2b, 2c. > > > > > The cxl_region_decode_reset() is in __cxl_decoder_detach() because of > > 2b/ and 2c/. No other chance to cleanup the decode topology once the > > endpoint decoders are on their way out of the system. > > Agreed. The reset remains. Proposed change only makes it conditional on > explicit region policy rather than AUTO. > > > > > In this case though the patch was generated back when we were committed > > to cleaning up failed to assemble regions, a new 2d/ case, right? > > However, in that case the decoder is not leaving the system. The > > questions that arrive from that analysis are: > > > > * Is this patch still needed now that there is no auto-cleanup? > > Not for this Soft Reserved series, but yes for Type2 hotplug. Type-2 hotplug is not the issue, it is boot-time configuration preservation over device reset which is a different challenge. > > * If this patch is still needed is it better to skip > > cxl_region_decode_reset() based on the 'enum cxl_detach_mode' rather > > than the CXL_REGION_F_AUTO flag? I.e. skip reset in the 2d/ case, or > > some other new general flag that says "please preserve hardware > > configuration". > > I looked at using and expanding the cxl_detach_mode enum and rejected as > not the right scope. The current detach mode is attached to an individual > detach operation, whereas preserve vs reset decision applies to the region > decode topology as a whole. If we expand detach mode for this region > wide policy, then may risk inconsistent handling across endpoint of the > same region. Just seemed wrong place. I could be missing another reason > why you looked at it. Regions are an emergent property from decoder settings. Decoder settings come from firmware, user actions, and with the type-2 series driver actions. Firmware, user and driver actions are per-decoder especially because the behavior needed here is similar to the decoder LOCK bit. Region assembly can set a default decoder policy, but the management of that decoder policy need not go through the region. Either way, settling this question can be post type-2 base series event, not a lead-in dependency. [..] > > It is helpful that violence has been the default so far. So it allows to > > introduce a decoder shutdown policy toggle where CXL_REGION_F_AUTO flags > > decoders as "preserve" by default. Region decommit clears that flag, > > and/or userspace can toggle that per endpoint decoder flag to determine > > what happens when decoders leave the system. That probably also wants > > some lockdown interaction such that root can not force unplug memory by > > unbinding a driver. > > As a step in the direction you suggest, AND aiming to address Type2 > need, here is what I'd like a direction check on: > > Start separating decode-reset policy rom CXL_REGION_F_AUTO: > - keep CXL_REGION_F_AUTO as origin / assembly semantics > - introduce CXL_REGION_F_PRESERVE_DECODE as a region-scoped policy Not yet convinced about this. > - initialize that policy from auto-assembly > - clear it on explicit decommit in commit_store() My expectation is still clear it on decoder configuration change, add an attribute to toggle it independent of changing the decoder configuration. > - use it to gate cxl_region_decode_reset() in __cxl_decoder_detach() cxl_region_decode_reset() just automates asking each decoder to carry out reset if the decoder policy allows. > The decode-reset decision is factored through a small helper, > cxl_region_preserve_decode(), so the policy can be extended independent > of the detach mechanics. Maybe overkill in this simple case, but I > wanted to acknowledge the 'policy' direction. Appreciate you pulling this together. I want to land type-2 with the existing expectation that unload is always destructive then circle back to address this additional detail because it is more than just decoder policy that needs to be managed. The type-2 driver may need help finding its platform firmware configured address range if a device reset destroyed the decoder settings.

