On 29/08/2025 07:21, Koralahalli Channabasappa, Smita wrote: > Hi Zhijian, > > On 8/26/2025 11:30 PM, Zhijian Li (Fujitsu) wrote: >> All, >> >> >> I have confirmed that in the !CXL_REGION configuration, the same environment >> may fail to fall back to hmem.(Your new patch cannot resolve this issue) >> >> In my environment: >> - There are two CXL memory devices corresponding to: >> ``` >> 5d0000000-6cffffff : CXL Window 0 >> 6d0000000-7cffffff : CXL Window 1 >> ``` >> - E820 table contains a 'soft reserved' entry: >> ``` >> [ 0.000000] BIOS-e820: [mem 0x00000005d0000000-0x00000007cfffffff] >> soft reserved >> ``` >> >> However, since my ACPI SRAT doesn't describe the CXL memory devices (the >> point), `acpi/hmat.c` won't allocate memory targets for them. This prevents >> the call chain: >> ```c >> hmat_register_target_devices() // for each SRAT-described target >> -> hmem_register_resource() >> -> insert entry into "HMEM devices" resource >> ``` >> >> Therefore, for successful fallback to hmem in this environment: >> `dax_hmem.ko` and `kmem.ko` must request resources BEFORE `cxl_acpi.ko` >> inserts 'CXL Window X' >> >> However the kernel cannot guarantee this initialization order. >> >> When cxl_acpi runs before dax_kmem/kmem: >> ``` >> (built-in) CXL_REGION=n >> driver/dax/hmem/device.c cxl_acpi.ko dax_hmem.ko kmem.ko >> >> (1) Add entry '15d0000000-7cfffffff' >> (2) Traverse "HMEM devices" >> Insert to iomem: >> 5d0000000-7cffffff : Soft >> Reserved >> >> (3) Insert CXL Window 0/1 >> /proc/iomem shows: >> 5d0000000-7cffffff : Soft Reserved >> 5d0000000-6cffffff : CXL Window 0 >> 6d0000000-7cffffff : CXL Window 1 >> >> (4) Create dax device >> (5) >> request_mem_region() fails >> for >> 5d0000000-7cffffff >> Reason: >> Children of 'Soft Reserved' >> (CXL >> Windows 0/1) don't cover full range >> ``` >> > > Thanks for confirming the failure point. I was thinking of two possible ways > forward here, and I would like to get feedback from others: > > [1] Teach dax_hmem to split when the parent claim fails: > If __request_region() fails for the top-level Soft Reserved range because > IORES_DESC_CXL children already exist, dax_hmem could iterate those windows > and register each one individually. The downside is that it adds some > complexity and feels a bit like papering over the fact that CXL should > eventually own all of this memory.
I examined below change to ensure kmem runs first, it seemed to work. static int __init cxl_acpi_init(void) { + if (!IS_ENABLED(CONFIG_DEV_DAX_CXL) && IS_ENABLED(CONFIG_DEV_DAX_KMEM)) { + /* fall back to dax_hmem,kmem */ + request_module("kmem"); + } return platform_driver_register(&cxl_acpi_driver); } > As Dan mentioned, the long-term plan is for Linux to not need the > soft-reserve fallback at all, and simply ignore Soft Reserve for CXL Windows > because the CXL subsystem will handle it. The current CXL_REGION kconfig states: Otherwise, platform-firmware managed CXL is enabled by being placed in the system address map and does not need a driver. I think this implies that a fallback to dax_hmem/kmem is still required for such cases. Of course, I personally agree with this 'long-term plan'. > > [2] Always unconditionally load CXL early.. > Call request_module("cxl_acpi"); request_module("cxl_pci"); from > dax_hmem_init() (without the IS_ENABLED(CONFIG_DEV_DAX_CXL) guard). If those > are y/m, they’ll be present; if n, it’s a no-op. Then in > hmem_register_device() drop the IS_ENABLED(CONFIG_DEV_DAX_CXL) gate and do: > > if (region_intersects(res->start, resource_size(res), > IORESOURCE_MEM, IORES_DESC_CXL) !=REGION_DISJOINT) > /* defer to CXL */; > > and defer to CXL if windows are present. This makes Soft Reserved unavailable > once CXL Windows have been discovered, even if CXL_REGION is disabled. That > aligns better with the idea that “CXL should win” whenever a window is > visible (This also needs to be considered alongside patch 6/6 in my series.) > > With CXL_REGION=n there would be no devdax and no kmem for that range; > proc/iomem would show only the windows something like below > > 850000000-284fffffff : CXL Window 0 > 2850000000-484fffffff : CXL Window 1 > 4850000000-684fffffff : CXL Window 2 > > That means the memory is left unclaimed/unavailable.. (no System RAM, no > /dev/dax). Is that acceptable when CXL_REGION is disabled? Regarding option [2] (unconditionally loading CXL early): This approach conflicts with the CXL_REGION Kconfig description mentioned above. --- To refocus on the original issue – the inability to recreate regions after destruction when CXL Windows overlap with Soft Reserved I believe your patch series "[PATCH 0/6] dax/hmem, cxl: Coordinate Soft Reserved handling with CXL" effectively addresses this problem. As for the pre-existing issues with !CXL_REGION and the unimplemented DAX_CXL_MODE_REGISTER, I suggest deferring them for now. They need not be resolved within this patch set, as we should prioritize the initial problem. Thanks Zhijian