Smita Koralahalli wrote:
> __cxl_decoder_detach() currently resets decoder programming whenever a
> region is detached if cxl_config_state is beyond CXL_CONFIG_ACTIVE. For
> autodiscovered regions, this can incorrectly tear down decoder state
> that may be relied upon by other consumers or by subsequent ownership
> decisions.
>
> Skip cxl_region_decode_reset() during detach when CXL_REGION_F_AUTO is
> set.
>
> Signed-off-by: Smita Koralahalli <[email protected]>
> Reviewed-by: Jonathan Cameron <[email protected]>
> Reviewed-by: Dave Jiang <[email protected]>
> Reviewed-by: Alejandro Lucero <[email protected]>
> ---
> drivers/cxl/core/region.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index ae899f68551f..45ee598daf95 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2178,7 +2178,9 @@ __cxl_decoder_detach(struct cxl_region *cxlr,
> cxled->part = -1;
>
> if (p->state > CXL_CONFIG_ACTIVE) {
> - cxl_region_decode_reset(cxlr, p->interleave_ways);
> + if (!test_bit(CXL_REGION_F_AUTO, &cxlr->flags))
> + cxl_region_decode_reset(cxlr, p->interleave_ways);
> +
> p->state = CXL_CONFIG_ACTIVE;
tl;dr: I do not think we need this, I do think we need to clarify to
users what enable/disable and/or hot remove violence is handled and not
handled by the CXL core.
So this looks deceptively simple, but I think it is incomplete or at
least adds to the current confusion. A couple points to consider:
1/ There is no corresponding clear_bit(CXL_REGION_F_AUTO, ...) anywhere in
the driver. Yes, admin can still force cxl_region_decode_reset() via
commit_store() path, but admin can not force
cxl_region_teardown_targets() in the __cxl_decoder_detach() path. I do
not like that this causes us to end up with 2 separate considerations
for when __cxl_decoder_detach() skips cleanup actions
(cxl_region_teardown_targets() and cxl_region_decode_reset()). See
below, I think the cxl_region_teardown_targets() check is probably
bogus.
At a minimum I think commit_store() should clear CXL_REGION_F_AUTO on
decommit such that cleaning up decoders and targets later proceeds as
expected.
2/ The hard part about CXL region cleanup is that it needs to be prepared
for:
a/ user manually removes the region via sysfs
b/ user manually disables cxl_port, cxl_mem, or cxl_acpi causing the
endpoint port to be removed
c/ user physically removes the memdev causing the endpoint port to be
removed (CXL core can not tell the difference with 2b/ it just sees
cxl_mem_driver::remove() operation invocation)
d/ setup action fails and region setup is unwound
The cxl_region_decode_reset() is in __cxl_decoder_detach() because of
2b/ and 2c/. No other chance to cleanup the decode topology once the
endpoint decoders are on their way out of the system.
In this case though the patch was generated back when we were committed
to cleaning up failed to assemble regions, a new 2d/ case, right?
However, in that case the decoder is not leaving the system. The
questions that arrive from that analysis are:
* Is this patch still needed now that there is no auto-cleanup?
* If this patch is still needed is it better to skip
cxl_region_decode_reset() based on the 'enum cxl_detach_mode' rather
than the CXL_REGION_F_AUTO flag? I.e. skip reset in the 2d/ case, or
some other new general flag that says "please preserve hardware
configuration".
* Should cxl_region_teardown_targets() also be caring about the
cxl_detach_mode rather than the auto flag? I actually think the
CXL_REGION_F_AUTO check in cxl_region_teardown_targets() is misplaced
and it was confusing "teardown targets" with "decode reset".
All of this wants some documentation to tell users that the rule is
"Hey, after any endpoint decoder has been seen by the CXL core, if you
remove that endpoint decoder by removing or disabling any of cxl_acpi,
cxl_mem, or cxl_port the CXL core *will* violently destroy the decode
configuration". Then think about whether this needs a way to specify
"skip decoder teardown" to disambiguate "the decoder is disappearing
logically, but not physically, keep its configuration". That allows
turning any manual configuration into an auto-configuration and has an
explicit rule for all regions rather than the current "auto regions are
special" policy.
It is helpful that violence has been the default so far. So it allows to
introduce a decoder shutdown policy toggle where CXL_REGION_F_AUTO flags
decoders as "preserve" by default. Region decommit clears that flag,
and/or userspace can toggle that per endpoint decoder flag to determine
what happens when decoders leave the system. That probably also wants
some lockdown interaction such that root can not force unplug memory by
unbinding a driver.