This series aims to address long-standing conflicts between HMEM and CXL when handling Soft Reserved memory ranges.
Reworked from Dan's patch: https://lore.kernel.org/all/[email protected]/ Previous work: https://lore.kernel.org/all/[email protected]/ Link to v6: https://lore.kernel.org/all/[email protected]/ The series is based on Linux 7.0-rc4 and base-commit is base-commit: f338e77383789c0cae23ca3d48adcc5e9e137e3c [1] After offlining the memory I can tear down the regions and recreate them back. dax_cxl creates dax devices and onlines memory. 850000000-284fffffff : CXL Window 0 850000000-284fffffff : region0 850000000-284fffffff : dax0.0 850000000-284fffffff : System RAM (kmem) [2] With CONFIG_CXL_REGION disabled, all the resources are handled by HMEM. Soft Reserved range shows up in /proc/iomem, no regions come up and dax devices are created from HMEM. 850000000-284fffffff : CXL Window 0 850000000-284fffffff : Soft Reserved 850000000-284fffffff : dax0.0 850000000-284fffffff : System RAM (kmem) [3] Region assembly failure: Soft Reserved range shows up in /proc/iomem and dax devices are handled by HMEM. 850000000-284fffffff : Soft Reserved 850000000-284fffffff : CXL Window 0 850000000-284fffffff : region0 850000000-284fffffff : dax6.0 850000000-284fffffff : System RAM (kmem) [4] REGISTER path: The results are as expected with both CXL_BUS = y and CXL_BUS = m. To validate the REGISTER path, I forced REGISTER even in cases where SR completely overlaps the CXL region as I did not have access to a system where the CXL region range is smaller than the SR range. 850000000-284fffffff : Soft Reserved 850000000-284fffffff : CXL Window 0 850000000-280fffffff : region0 850000000-284fffffff : dax6.0 850000000-284fffffff : System RAM (kmem) kreview complained on the deadlock for taking pdev->dev.mutex before wait_for_device_probe(). Hence, I moved it. >From kreview: The guard(device) takes pdev->dev.mutex and holds it across wait_for_device_probe(). If any probe function in the system tries to access this device (directly or indirectly), it would need the same mutex: process_defer_work() guard(device)(&pdev->dev) <- Takes pdev->dev.mutex wait_for_device_probe() <- Waits for all probes globally wait_event(probe_count == 0) Meanwhile, if another driver's probe: some_driver_probe() device_lock(&pdev->dev) <- Blocks waiting for mutex The probe can't complete while waiting for the mutex, and wait_for_device_probe() won't return while the probe is pending.. v7 updates: - Added Reviewed-by tags. - co-developed-by -> Suggested-by for Patch 4. - Dropped "cxl/region: Skip decoder reset for auto-discovered regions" patch. - cxl_region_contains_soft_reserve() -> cxl_region_contains_resource() - Dropped scoped_guard around request_resource() and release_resource(). - Dropped patch 7. All deferred work infrastructure moved from bus.c into hmem.c - Dropped enum dax_cxl_mode (DEFER/REGISTER/DROP) and replaced with bool dax_hmem_initial_probe in device.c (built-in, survives module reload). - Changed from all-or-nothing to per-range ownership decisions. Each range decided individually — CXL keeps what it covers, HMEM gets the rest. - Replaces single pass walk instead of 2 passes to exercise per range ownership. - Moved wait_for_device_probe() before guard(device) to avoid lockdep warning (kreview, Gregory). - Added guard(device) + driver bound check. - Added get_device()/put_device() for pdev refcount. - Added flush_work() in dax_hmem_exit() to ensure work completes before module unload. - dax_hmem_flush_work() exported from dax_hmem.ko — symbol dependency forces dax_hmem to load before dax_cxl (Dan requirement 2). - Added static inline no-op stub in bus.h for CONFIG_DEV_DAX_HMEM = n. - Added work_pending() check (Dan requirement 3). - pdev and work_struct initialized together on first probe, making singleton nature explicit. static struct and INIT_WORK once. - Reverted back to container_of() in work function instead of global variables. - No kill_defer_work() with the struct being static. v6 updates: - Patch 1-3 no changes. - New Patches 4-5. - (void *)res -> res. - cxl_region_contains_soft_reserve -> region_contains_soft_reserve. - New file include/cxl/cxl.h - Introduced singleton workqueue. - hmem to queue the work and cxl to flush. - cxl_contains_soft_reserve() -> soft_reserve_has_cxl_match(). - Included descriptions for dax_cxl_mode. - kzalloc -> kmalloc in add_soft_reserve_into_iomem() - dax_cxl_mode is exported to CXL. - Introduced hmem_register_cxl_device() for walking only CXL intersected SR ranges the second time. v5 updates: - Patch 1 dropped as its been merged for-7.0/cxl-init. - Added Reviewed-by tags. - Shared dax_cxl_mode between dax/cxl.c and dax/hmem.c and used -EPROBE_DEFER to defer dax_cxl. - CXL_REGION_F_AUTO check for resetting decoders. - Teardown all CXL regions if any one CXL region doesn't fully contain the Soft Reserved range. - Added helper cxl_region_contains_sr() to determine Soft Reserved ownership. - bus_rescan_devices() to retry dax_cxl. - Added guard(rwsem_read)(&cxl_rwsem.region). v4 updates: - No changes patches 1-3. - New patches 4-7. - handle_deferred_cxl() has been enhanced to handle case where CXL regions do not contiguously and fully cover Soft Reserved ranges. - Support added to defer cxl_dax registration. - Support added to teardown cxl regions. v3 updates: - Fixed two "From". v2 updates: - Removed conditional check on CONFIG_EFI_SOFT_RESERVE as dax_hmem depends on CONFIG_EFI_SOFT_RESERVE. (Zhijian) - Added TODO note. (Zhijian) - Included region_intersects_soft_reserve() inside CONFIG_EFI_SOFT_RESERVE conditional check. (Zhijian) - insert_resource_late() -> insert_resource_expand_to_fit() and __insert_resource_expand_to_fit() replacement. (Boris) - Fixed Co-developed and Signed-off by. (Dan) - Combined 2/6 and 3/6 into a single patch. (Zhijian). - Skip local variable in remove_soft_reserved. (Jonathan) - Drop kfree with __free(). (Jonathan) - return 0 -> return dev_add_action_or_reset(host...) (Jonathan) - Dropped 6/6. - Reviewed-by tags (Dave, Jonathan) Dan Williams (3): dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges dax/hmem: Gate Soft Reserved deferral on DEV_DAX_CXL dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding Smita Koralahalli (4): dax: Track all dax_region allocations under a global resource tree cxl/region: Add helper to check Soft Reserved containment by CXL regions dax/hmem, cxl: Defer and resolve Soft Reserved ownership dax/hmem: Reintroduce Soft Reserved ranges back into the iomem tree drivers/cxl/core/region.c | 30 ++++++++++ drivers/dax/Kconfig | 2 + drivers/dax/Makefile | 3 +- drivers/dax/bus.c | 20 ++++++- drivers/dax/bus.h | 7 +++ drivers/dax/cxl.c | 28 ++++++++- drivers/dax/hmem/device.c | 3 + drivers/dax/hmem/hmem.c | 117 ++++++++++++++++++++++++++++++++++---- include/cxl/cxl.h | 15 +++++ 9 files changed, 208 insertions(+), 17 deletions(-) create mode 100644 include/cxl/cxl.h base-commit: f338e77383789c0cae23ca3d48adcc5e9e137e3c -- 2.17.1

