Hi Alison,
On 7/15/2025 2:07 PM, Alison Schofield wrote:
On Tue, Jul 15, 2025 at 06:04:00PM +0000, Smita Koralahalli wrote:
This series introduces the ability to manage SOFT RESERVED iomem
resources, enabling the CXL driver to remove any portions that
intersect with created CXL regions.
Hi Smita,
This set applied cleanly to todays cxl-next but fails like appended
before region probe.
BTW - there were sparse warnings in the build that look related:
CHECK drivers/dax/hmem/hmem_notify.c
drivers/dax/hmem/hmem_notify.c:10:6: warning: context imbalance in
'hmem_register_fallback_handler' - wrong count at exit
drivers/dax/hmem/hmem_notify.c:24:9: warning: context imbalance in
'hmem_fallback_register_device' - wrong count at exit
Thanks for pointing this bug. I failed to release the spinlock before
calling hmem_register_device(), which internally calls
platform_device_add() and can sleep. The following fix addresses that
bug. I’ll incorporate this into v6:
diff --git a/drivers/dax/hmem/hmem_notify.c b/drivers/dax/hmem/hmem_notify.c
index 6c276c5bd51d..8f411f3fe7bd 100644
--- a/drivers/dax/hmem/hmem_notify.c
+++ b/drivers/dax/hmem/hmem_notify.c
@@ -18,8 +18,9 @@ void hmem_fallback_register_device(int target_nid,
const struct resource *res)
{
walk_hmem_fn hmem_fn;
- guard(spinlock)(&hmem_notify_lock);
+ spin_lock(&hmem_notify_lock);
hmem_fn = hmem_fallback_fn;
+ spin_unlock(&hmem_notify_lock);
if (hmem_fn)
hmem_fn(target_nid, res);
--
As for the log:
[ 53.652454] cxl_acpi:cxl_softreserv_mem_work_fn:888: Timeout waiting
for cxl_mem probing
I’m still analyzing that. Here's what was my thought process so far.
- This occurs when cxl_acpi_probe() runs significantly earlier than
cxl_mem_probe(), so CXL region creation (which happens in
cxl_port_endpoint_probe()) may or may not have completed by the time
trimming is attempted.
- Both cxl_acpi and cxl_mem have MODULE_SOFTDEPs on cxl_port. This does
guarantee load order when all components are built as modules. So even
if the timeout occurs and cxl_mem_probe() hasn’t run within the wait
window, MODULE_SOFTDEP ensures that cxl_port is loaded before both
cxl_acpi and cxl_mem in modular configurations. As a result, region
creation is eventually guaranteed, and wait_for_device_probe() will
succeed once the relevant probes complete.
- However, when both CONFIG_CXL_PORT=y and CONFIG_CXL_ACPI=y, there's no
guarantee of probe ordering. In such cases, cxl_acpi_probe() may finish
before cxl_port_probe() even begins, which can cause
wait_for_device_probe() to return prematurely and trigger the timeout.
- In my local setup, I observed that a 30-second timeout was generally
sufficient to catch this race, allowing cxl_port_probe() to load while
cxl_acpi_probe() is still active. Since we cannot mix built-in and
modular components (i.e., have cxl_acpi=y and cxl_port=m), the timeout
serves as a best-effort mechanism. After the timeout,
wait_for_device_probe() ensures cxl_port_probe() has completed before
trimming proceeds, making the logic good enough to most boot-time races.
One possible improvement I’m considering is to schedule a
delayed_workqueue() from cxl_acpi_probe(). This deferred work could wait
slightly longer for cxl_mem_probe() to complete (which itself softdeps
on cxl_port) before initiating the soft reserve trimming.
That said, I'm still evaluating better options to more robustly
coordinate probe ordering between cxl_acpi, cxl_port, cxl_mem and
cxl_region and looking for suggestions here.
Thanks
Smita
This isn't all the logs, I trimmed. Let me know if you need more or
other info to reproduce.
[ 53.652454] cxl_acpi:cxl_softreserv_mem_work_fn:888: Timeout waiting for
cxl_mem probing
[ 53.653293] BUG: sleeping function called from invalid context at
./include/linux/sched/mm.h:321
[ 53.653513] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1875,
name: kworker/46:1
[ 53.653540] preempt_count: 1, expected: 0
[ 53.653554] RCU nest depth: 0, expected: 0
[ 53.653568] 3 locks held by kworker/46:1/1875:
[ 53.653569] #0: ff37d78240041548 ((wq_completion)events){+.+.}-{0:0}, at:
process_one_work+0x578/0x630
[ 53.653583] #1: ff6b0385dedf3e38 (cxl_sr_work){+.+.}-{0:0}, at:
process_one_work+0x1bd/0x630
[ 53.653589] #2: ffffffffb33476d8 (hmem_notify_lock){+.+.}-{3:3}, at:
hmem_fallback_register_device+0x23/0x60
[ 53.653598] Preemption disabled at:
[ 53.653599] [<ffffffffb1e23993>] hmem_fallback_register_device+0x23/0x60
[ 53.653640] CPU: 46 UID: 0 PID: 1875 Comm: kworker/46:1 Not tainted
6.16.0CXL-NEXT-ALISON-SR-V5+ #5 PREEMPT(voluntary)
[ 53.653643] Workqueue: events cxl_softreserv_mem_work_fn [cxl_acpi]
[ 53.653648] Call Trace:
[ 53.653649] <TASK>
[ 53.653652] dump_stack_lvl+0xa8/0xd0
[ 53.653658] dump_stack+0x14/0x20
[ 53.653659] __might_resched+0x1ae/0x2d0
[ 53.653666] __might_sleep+0x48/0x70
[ 53.653668] __kmalloc_node_track_caller_noprof+0x349/0x510
[ 53.653674] ? __devm_add_action+0x3d/0x160
[ 53.653685] ? __pfx_devm_action_release+0x10/0x10
[ 53.653688] __devres_alloc_node+0x4a/0x90
[ 53.653689] ? __devres_alloc_node+0x4a/0x90
[ 53.653691] ? __pfx_release_memregion+0x10/0x10 [dax_hmem]
[ 53.653693] __devm_add_action+0x3d/0x160
[ 53.653696] hmem_register_device+0xea/0x230 [dax_hmem]
[ 53.653700] hmem_fallback_register_device+0x37/0x60
[ 53.653703] cxl_softreserv_mem_register+0x24/0x30 [cxl_core]
[ 53.653739] walk_iomem_res_desc+0x55/0xb0
[ 53.653744] ? __pfx_cxl_softreserv_mem_register+0x10/0x10 [cxl_core]
[ 53.653755] cxl_region_softreserv_update+0x46/0x50 [cxl_core]
[ 53.653761] cxl_softreserv_mem_work_fn+0x4a/0x110 [cxl_acpi]
[ 53.653763] ? __pfx_autoremove_wake_function+0x10/0x10
[ 53.653768] process_one_work+0x1fa/0x630
[ 53.653774] worker_thread+0x1b2/0x360
[ 53.653777] kthread+0x128/0x250
[ 53.653781] ? __pfx_worker_thread+0x10/0x10
[ 53.653784] ? __pfx_kthread+0x10/0x10
[ 53.653786] ret_from_fork+0x139/0x1e0
[ 53.653790] ? __pfx_kthread+0x10/0x10
[ 53.653792] ret_from_fork_asm+0x1a/0x30
[ 53.653801] </TASK>
[ 53.654193] =============================
[ 53.654203] [ BUG: Invalid wait context ]
[ 53.654451] 6.16.0CXL-NEXT-ALISON-SR-V5+ #5 Tainted: G W
[ 53.654623] -----------------------------
[ 53.654785] kworker/46:1/1875 is trying to lock:
[ 53.654946] ff37d7824096d588 (&root->kernfs_rwsem){++++}-{4:4}, at:
kernfs_add_one+0x34/0x390
[ 53.655115] other info that might help us debug this:
[ 53.655273] context-{5:5}
[ 53.655428] 3 locks held by kworker/46:1/1875:
[ 53.655579] #0: ff37d78240041548 ((wq_completion)events){+.+.}-{0:0}, at:
process_one_work+0x578/0x630
[ 53.655739] #1: ff6b0385dedf3e38 (cxl_sr_work){+.+.}-{0:0}, at:
process_one_work+0x1bd/0x630
[ 53.655900] #2: ffffffffb33476d8 (hmem_notify_lock){+.+.}-{3:3}, at:
hmem_fallback_register_device+0x23/0x60
[ 53.656062] stack backtrace:
[ 53.656224] CPU: 46 UID: 0 PID: 1875 Comm: kworker/46:1 Tainted: G W
6.16.0CXL-NEXT-ALISON-SR-V5+ #5 PREEMPT(voluntary)
[ 53.656227] Tainted: [W]=WARN
[ 53.656228] Workqueue: events cxl_softreserv_mem_work_fn [cxl_acpi]
[ 53.656232] Call Trace:
[ 53.656232] <TASK>
[ 53.656234] dump_stack_lvl+0x85/0xd0
[ 53.656238] dump_stack+0x14/0x20
[ 53.656239] __lock_acquire+0xaf4/0x2200
[ 53.656246] lock_acquire+0xd8/0x300
[ 53.656248] ? kernfs_add_one+0x34/0x390
[ 53.656252] ? __might_resched+0x208/0x2d0
[ 53.656257] down_write+0x44/0xe0
[ 53.656262] ? kernfs_add_one+0x34/0x390
[ 53.656263] kernfs_add_one+0x34/0x390
[ 53.656265] kernfs_create_dir_ns+0x5a/0xa0
[ 53.656268] sysfs_create_dir_ns+0x74/0xd0
[ 53.656270] kobject_add_internal+0xb1/0x2f0
[ 53.656273] kobject_add+0x7d/0xf0
[ 53.656275] ? get_device_parent+0x28/0x1e0
[ 53.656280] ? __pfx_klist_children_get+0x10/0x10
[ 53.656282] device_add+0x124/0x8b0
[ 53.656285] ? dev_set_name+0x56/0x70
[ 53.656287] platform_device_add+0x102/0x260
[ 53.656289] hmem_register_device+0x160/0x230 [dax_hmem]
[ 53.656291] hmem_fallback_register_device+0x37/0x60
[ 53.656294] cxl_softreserv_mem_register+0x24/0x30 [cxl_core]
[ 53.656323] walk_iomem_res_desc+0x55/0xb0
[ 53.656326] ? __pfx_cxl_softreserv_mem_register+0x10/0x10 [cxl_core]
[ 53.656335] cxl_region_softreserv_update+0x46/0x50 [cxl_core]
[ 53.656342] cxl_softreserv_mem_work_fn+0x4a/0x110 [cxl_acpi]
[ 53.656343] ? __pfx_autoremove_wake_function+0x10/0x10
[ 53.656346] process_one_work+0x1fa/0x630
[ 53.656350] worker_thread+0x1b2/0x360
[ 53.656352] kthread+0x128/0x250
[ 53.656354] ? __pfx_worker_thread+0x10/0x10
[ 53.656356] ? __pfx_kthread+0x10/0x10
[ 53.656357] ret_from_fork+0x139/0x1e0
[ 53.656360] ? __pfx_kthread+0x10/0x10
[ 53.656361] ret_from_fork_asm+0x1a/0x30
[ 53.656366] </TASK>
[ 53.662274] BUG: scheduling while atomic: kworker/46:1/1875/0x00000002
[ 53.663552] schedule+0x4a/0x160
[ 53.663553] schedule_timeout+0x10a/0x120
[ 53.663555] ? debug_smp_processor_id+0x1b/0x30
[ 53.663556] ? trace_hardirqs_on+0x5f/0xd0
[ 53.663558] __wait_for_common+0xb9/0x1c0
[ 53.663559] ? __pfx_schedule_timeout+0x10/0x10
[ 53.663561] wait_for_completion+0x28/0x30
[ 53.663562] __synchronize_srcu+0xbf/0x180
[ 53.663566] ? __pfx_wakeme_after_rcu+0x10/0x10
[ 53.663571] ? i2c_repstart+0x30/0x80
[ 53.663576] synchronize_srcu+0x46/0x120
[ 53.663577] kill_dax+0x47/0x70
[ 53.663580] __devm_create_dev_dax+0x112/0x470
[ 53.663582] devm_create_dev_dax+0x26/0x50
[ 53.663584] dax_hmem_probe+0x87/0xd0 [dax_hmem]
[ 53.663585] platform_probe+0x61/0xd0
[ 53.663589] really_probe+0xe2/0x390
[ 53.663591] ? __pfx___device_attach_driver+0x10/0x10
[ 53.663593] __driver_probe_device+0x7e/0x160
[ 53.663594] driver_probe_device+0x23/0xa0
[ 53.663596] __device_attach_driver+0x92/0x120
[ 53.663597] bus_for_each_drv+0x8c/0xf0
[ 53.663599] __device_attach+0xc2/0x1f0
[ 53.663601] device_initial_probe+0x17/0x20
[ 53.663603] bus_probe_device+0xa8/0xb0
[ 53.663604] device_add+0x687/0x8b0
[ 53.663607] ? dev_set_name+0x56/0x70
[ 53.663609] platform_device_add+0x102/0x260
[ 53.663610] hmem_register_device+0x160/0x230 [dax_hmem]
[ 53.663612] hmem_fallback_register_device+0x37/0x60
[ 53.663614] cxl_softreserv_mem_register+0x24/0x30 [cxl_core]
[ 53.663637] walk_iomem_res_desc+0x55/0xb0
[ 53.663640] ? __pfx_cxl_softreserv_mem_register+0x10/0x10 [cxl_core]
[ 53.663647] cxl_region_softreserv_update+0x46/0x50 [cxl_core]
[ 53.663654] cxl_softreserv_mem_work_fn+0x4a/0x110 [cxl_acpi]
[ 53.663655] ? __pfx_autoremove_wake_function+0x10/0x10
[ 53.663658] process_one_work+0x1fa/0x630
[ 53.663662] worker_thread+0x1b2/0x360
[ 53.663664] kthread+0x128/0x250
[ 53.663666] ? __pfx_worker_thread+0x10/0x10
[ 53.663668] ? __pfx_kthread+0x10/0x10
[ 53.663670] ret_from_fork+0x139/0x1e0
[ 53.663672] ? __pfx_kthread+0x10/0x10
[ 53.663673] ret_from_fork_asm+0x1a/0x30
[ 53.663677] </TASK>
[ 53.700107] BUG: scheduling while atomic: kworker/46:1/1875/0x00000002
[ 53.700264] INFO: lockdep is turned off.
[ 53.701315] Preemption disabled at:
[ 53.701316] [<ffffffffb1e23993>] hmem_fallback_register_device+0x23/0x60
[ 53.701631] CPU: 46 UID: 0 PID: 1875 Comm: kworker/46:1 Tainted: G W
6.16.0CXL-NEXT-ALISON-SR-V5+ #5 PREEMPT(voluntary)
[ 53.701633] Tainted: [W]=WARN
[ 53.701635] Workqueue: events cxl_softreserv_mem_work_fn [cxl_acpi]
[ 53.701638] Call Trace:
[ 53.701638] <TASK>
[ 53.701640] dump_stack_lvl+0xa8/0xd0
[ 53.701644] dump_stack+0x14/0x20
[ 53.701645] __schedule_bug+0xa2/0xd0
[ 53.701649] __schedule+0xe6f/0x10d0
[ 53.701652] ? debug_smp_processor_id+0x1b/0x30
[ 53.701655] ? lock_release+0x1e6/0x2b0
[ 53.701658] ? trace_hardirqs_on+0x5f/0xd0
[ 53.701661] schedule+0x4a/0x160
[ 53.701662] schedule_timeout+0x10a/0x120
[ 53.701664] ? debug_smp_processor_id+0x1b/0x30
[ 53.701666] ? trace_hardirqs_on+0x5f/0xd0
[ 53.701667] __wait_for_common+0xb9/0x1c0
[ 53.701668] ? __pfx_schedule_timeout+0x10/0x10
[ 53.701670] wait_for_completion+0x28/0x30
[ 53.701671] __synchronize_srcu+0xbf/0x180
[ 53.701677] ? __pfx_wakeme_after_rcu+0x10/0x10
[ 53.701682] ? i2c_repstart+0x30/0x80
[ 53.701685] synchronize_srcu+0x46/0x120
[ 53.701687] kill_dax+0x47/0x70
[ 53.701689] __devm_create_dev_dax+0x112/0x470
[ 53.701691] devm_create_dev_dax+0x26/0x50
[ 53.701693] dax_hmem_probe+0x87/0xd0 [dax_hmem]
[ 53.701695] platform_probe+0x61/0xd0
[ 53.701698] really_probe+0xe2/0x390
[ 53.701700] ? __pfx___device_attach_driver+0x10/0x10
[ 53.701701] __driver_probe_device+0x7e/0x160
[ 53.701703] driver_probe_device+0x23/0xa0
[ 53.701704] __device_attach_driver+0x92/0x120
[ 53.701706] bus_for_each_drv+0x8c/0xf0
[ 53.701708] __device_attach+0xc2/0x1f0
[ 53.701710] device_initial_probe+0x17/0x20
[ 53.701711] bus_probe_device+0xa8/0xb0
[ 53.701712] device_add+0x687/0x8b0
[ 53.701715] ? dev_set_name+0x56/0x70
[ 53.701717] platform_device_add+0x102/0x260
[ 53.701718] hmem_register_device+0x160/0x230 [dax_hmem]
[ 53.701720] hmem_fallback_register_device+0x37/0x60
[ 53.701722] cxl_softreserv_mem_register+0x24/0x30 [cxl_core]
[ 53.701734] walk_iomem_res_desc+0x55/0xb0
[ 53.701738] ? __pfx_cxl_softreserv_mem_register+0x10/0x10 [cxl_core]
[ 53.701745] cxl_region_softreserv_update+0x46/0x50 [cxl_core]
[ 53.701751] cxl_softreserv_mem_work_fn+0x4a/0x110 [cxl_acpi]
[ 53.701752] ? __pfx_autoremove_wake_function+0x10/0x10
[ 53.701756] process_one_work+0x1fa/0x630
[ 53.701760] worker_thread+0x1b2/0x360
[ 53.701762] kthread+0x128/0x250
[ 53.701765] ? __pfx_worker_thread+0x10/0x10
[ 53.701766] ? __pfx_kthread+0x10/0x10
[ 53.701768] ret_from_fork+0x139/0x1e0
[ 53.701771] ? __pfx_kthread+0x10/0x10
[ 53.701772] ret_from_fork_asm+0x1a/0x30
[ 53.701777] </TASK>