On 7/18/25 5:30 AM, Neeraj Kumar wrote:
> On 09/07/25 05:38PM, Dave Jiang wrote:
>>
>>
>> On 6/17/25 5:39 AM, Neeraj Kumar wrote:
>>> In 84ec985944ef3, For cxl pmem region auto-assembly after endpoint port
>>> probing, cxl_nvd presence was required. And for cxl region persistency,
>>> region creation happens during nvdimm_probe which need the completion
>>> of endpoint probe.
>>>
>>> It is therefore refactored cxl pmem region auto-assembly after endpoint
>>> probing to cxl mem probing
>>
>> The region auto-assembly is moved after the endpoint device is added.
>> However, it's not guaranteed that the endpoint probe has completed and
>> completed successfully. You are just getting lucky timing wise that endpoint
>> probe is completed when the new region discovery is starting.
>
> Hi Dave,
>
> For region auto-assembly two things are required
> 1. endpoint(s) decoder should be enumerated successfully
> 2. As per fix in 84ec985944ef3, The cxl_nvd of the memdev needs to be
> available during the pmem region probe
>
> Prior to current patch, region auto-assembly was happening from
> cxl_endpoint_port_probe() after endpoint decoder enumeration.
> In 84ec985944ef3, sequence of devm_cxl_add_nvdimm() was moved before
> devm_cxl_add_endpoint() so as to meet region auto assembly dependency on
> cxl_nvd of the memdev.
>
> Here, In this patch I have moved region auto-assembly after
> 1. devm_cxl_add_endpoint(): This function makes sure
> cxl_endpoint_port_probe() has completed successfully, as per below check in
> devm_cxl_add_endpoint()
> if (!endpoint->dev.driver) {
> dev_err(&cxlmd->dev, "%s failed probe\n",
> dev_name(&endpoint->dev));
> return -ENXIO;
> }
> And successfull completion of cxl_endpoint_port_probe(), it must have
> enumerated endpoint(s) decoder successfully
> 2. devm_cxl_add_nvdimm(): As you rightly said, this allocates "cxl_nvd"
> nvdimm device and triggers the async probe of nvdimm driver
>
> Actually in this patch, from async probe function (cxl_nvdimm_probe()), I am
> creating "struct nvdimm" using __nvdimm_create()
> This __nvdimm_create() ultimately scans LSA. If LSA finds region label into
> it then it saves region information into struct nvdimm
> and then using create_pmem_region(nvdimm), I am re-creating cxl region for
> region persistency.
>
> As for cxl region persistency (based on LSA 2.1 scanning - this patch)
> following sequence should be required
> 1. devm_cxl_add_endpoint(): endpoint probe completion - which is getting
> done by devm_cxl_add_endpoint()
> 2. devm_cxl_add_nvdimm(): Here after nvdimm device creation, cxl region is
> being created
>
> It is therefore re-sequencing of region-auto assembly is required to move
> from cxl_endpoint_port_probe() to after
> devm_cxl_add_endpoint() and devm_cxl_add_nvdimm()
>
>> Return of devm_cxl_add_nvdimm() only adds the nvdimm device and triggers the
>> async probe of nvdimm driver. You have to take the endpoint port lock
>
> I think we may not require endpoint port lock as auto-assembly and region
> persistency code sequence is always after successful completion of endpoint
> probe.
>
>> and check if the driver is attached to be certain that endpoint probe is
>> done and successful. Therefore moving the region discovery location probably
>> does not do what you think it does.
>> Maybe take a deeper look at the region discovery code and see how it does
>> retry if things are not present and approach it from that angle?
>>
>
> Please let me know if my understanding is correct or am I missing something?
No I think your assumptions are fine. I incorrectly assumed that the cxl_port
driver is probed async. But in reality, only the cxl_pci driver is probed
async. So devm_cxl_add_endpoint() will ensure that the port driver is attached
and the rest should work as what you have setup for nvdimm. Sorry about the
noise.
DJ
>
>>>
>>> Signed-off-by: Neeraj Kumar <s.nee...@samsung.com>
>>> ---
>>> drivers/cxl/core/port.c | 38 ++++++++++++++++++++++++++++++++++++++
>>> drivers/cxl/cxl.h | 1 +
>>> drivers/cxl/mem.c | 27 ++++++++++++++++++---------
>>> drivers/cxl/port.c | 38 --------------------------------------
>>> 4 files changed, 57 insertions(+), 47 deletions(-)
>>>
>>> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
>>> index 78a5c2c25982..bca668193c49 100644
>>> --- a/drivers/cxl/core/port.c
>>> +++ b/drivers/cxl/core/port.c
>>> @@ -1038,6 +1038,44 @@ void put_cxl_root(struct cxl_root *cxl_root)
>>> }
>>> EXPORT_SYMBOL_NS_GPL(put_cxl_root, "CXL");
>>>
>>> +static int discover_region(struct device *dev, void *root)
>>> +{
>>> + struct cxl_endpoint_decoder *cxled;
>>> + int rc;
>>> +
>>> + if (!is_endpoint_decoder(dev))
>>> + return 0;
>>> +
>>> + cxled = to_cxl_endpoint_decoder(dev);
>>> + if ((cxled->cxld.flags & CXL_DECODER_F_ENABLE) == 0)
>>> + return 0;
>>> +
>>> + if (cxled->state != CXL_DECODER_STATE_AUTO)
>>> + return 0;
>>> +
>>> + /*
>>> + * Region enumeration is opportunistic, if this add-event fails,
>>> + * continue to the next endpoint decoder.
>>> + */
>>> + rc = cxl_add_to_region(root, cxled);
>>> + if (rc)
>>> + dev_dbg(dev, "failed to add to region: %#llx-%#llx\n",
>>> + cxled->cxld.hpa_range.start, cxled->cxld.hpa_range.end);
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +void cxl_region_discovery(struct cxl_port *port)
>>> +{
>>> + struct cxl_port *root;
>>> + struct cxl_root *cxl_root __free(put_cxl_root) = find_cxl_root(port);
>>> +
>>> + root = &cxl_root->port;
>>> +
>>> + device_for_each_child(&port->dev, root, discover_region);
>>> +}
>>> +EXPORT_SYMBOL_NS_GPL(cxl_region_discovery, "CXL");
>>> +
>>
>> I have concerns about adding region related code in core/port.c while the
>> rest of the region code is walled behind CONFIG_CXL_REGION. I think this
>> change needs to go to core/region.c.
>>
>> DJ
>>
>
> Sure Dave, I will move it into core/region.c in next patch-set.
>
> Regards,
> Neeraj
>