On Wed, May 7, 2025 at 2:38 PM Arvind Yadav <[email protected]> wrote: > > Switch cancel_delayed_work() to cancel_delayed_work_sync() to ensure > the delayed work has finished executing before proceeding with > resource cleanup. This prevents a potential use-after-free or > NULL dereference if the resume_work is still running during finalization.
There are several other places with similar patterns that look suspect. E.g., amdgpu_userq_destroy() and amdgpu_userq_evict(). Alex > > BUG: kernel NULL pointer dereference, address: 0000000000000140 > [ +0.000050] #PF: supervisor read access in kernel mode > [ +0.000019] #PF: error_code(0x0000) - not-present page > [ +0.000021] PGD 0 P4D 0 > [ +0.000015] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI > [ +0.000021] CPU: 17 UID: 0 PID: 196299 Comm: kworker/17:0 Tainted: G U > 6.14.0-org-staging #1 > [ +0.000032] Tainted: [U]=USER > [ +0.000015] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS > ELITE/X570 AORUS ELITE, BIOS F39 03/22/2024 > [ +0.000029] Workqueue: events amdgpu_userq_restore_worker [amdgpu] > [ +0.000426] RIP: 0010:drm_exec_lock_obj+0x32/0x210 [drm_exec] > [ +0.000025] Code: e5 41 57 41 56 41 55 49 89 f5 41 54 49 89 fc 48 83 ec 08 > 4c 8b 77 30 4d 85 f6 0f 85 c0 00 00 00 4c 8d 7f 08 48 39 77 38 74 54 <49> 8b > bd f8 00 00 00 4c 89 fe 41 f6 04 24 01 75 3c e8 08 50 bc e0 > [ +0.000046] RSP: 0018:ffffab1b04da3ce8 EFLAGS: 00010297 > [ +0.000020] RAX: 0000000000000001 RBX: ffff930cc60e4bc0 RCX: > 0000000000000000 > [ +0.000025] RDX: 0000000000000004 RSI: 0000000000000048 RDI: > ffffab1b04da3d88 > [ +0.000028] RBP: ffffab1b04da3d10 R08: ffff930cc60e4000 R09: > 0000000000000000 > [ +0.000022] R10: ffffab1b04da3d18 R11: 0000000000000001 R12: > ffffab1b04da3d88 > [ +0.000023] R13: 0000000000000048 R14: 0000000000000000 R15: > ffffab1b04da3d90 > [ +0.000023] FS: 0000000000000000(0000) GS:ffff9313dea80000(0000) > knlGS:0000000000000000 > [ +0.000024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ +0.000021] CR2: 0000000000000140 CR3: 000000018351a000 CR4: > 0000000000350ef0 > [ +0.000025] Call Trace: > [ +0.000018] <TASK> > [ +0.000015] ? show_regs+0x69/0x80 > [ +0.000022] ? __die+0x25/0x70 > [ +0.000019] ? page_fault_oops+0x15d/0x510 > [ +0.000024] ? do_user_addr_fault+0x312/0x690 > [ +0.000024] ? sched_clock_cpu+0x10/0x1a0 > [ +0.000028] ? exc_page_fault+0x78/0x1b0 > [ +0.000025] ? asm_exc_page_fault+0x27/0x30 > [ +0.000024] ? drm_exec_lock_obj+0x32/0x210 [drm_exec] > [ +0.000024] drm_exec_prepare_obj+0x21/0x60 [drm_exec] > [ +0.000021] amdgpu_vm_lock_pd+0x22/0x30 [amdgpu] > [ +0.000266] amdgpu_userq_validate_bos+0x6c/0x320 [amdgpu] > [ +0.000333] amdgpu_userq_restore_worker+0x4a/0x120 [amdgpu] > [ +0.000316] process_one_work+0x189/0x3c0 > [ +0.000021] worker_thread+0x2a4/0x3b0 > [ +0.000022] kthread+0x109/0x220 > [ +0.000018] ? __pfx_worker_thread+0x10/0x10 > [ +0.000779] ? _raw_spin_unlock_irq+0x1f/0x40 > [ +0.000560] ? __pfx_kthread+0x10/0x10 > [ +0.000543] ret_from_fork+0x3c/0x60 > [ +0.000507] ? __pfx_kthread+0x10/0x10 > [ +0.000515] ret_from_fork_asm+0x1a/0x30 > [ +0.000515] </TASK> > > Cc: Alex Deucher <[email protected]> > Cc: Christian König <[email protected]> > Cc: Sunil Khatri <[email protected]> > Signed-off-by: Arvind Yadav <[email protected]> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c > index afbe01149ed3..711e190a6a82 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c > @@ -774,7 +774,7 @@ void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr > *userq_mgr) > struct amdgpu_userq_mgr *uqm, *tmp; > uint32_t queue_id; > > - cancel_delayed_work(&userq_mgr->resume_work); > + cancel_delayed_work_sync(&userq_mgr->resume_work); > > mutex_lock(&userq_mgr->userq_mutex); > idr_for_each_entry(&userq_mgr->userq_idr, queue, queue_id) { > -- > 2.34.1 >
