On Wed, Apr 29, 2026 at 7:37 PM Mikhail Gavrilov <[email protected]> wrote: > > When dumping IB contents from a hung job, amdgpu_devcoredump_format() > acquires the VM root PD's reservation lock via amdgpu_vm_lock_by_pasid() > and then, for each IB referenced by the job, calls amdgpu_bo_reserve() > on the BO that backs the IB. Both reservations are taken on > reservation_ww_class_mutex objects but neither uses a ww_acquire_ctx, > which trips lockdep: > > WARNING: possible recursive locking detected > -------------------------------------------- > kworker/u128:0 is trying to acquire lock: > ffff88838b16e1f0 (reservation_ww_class_mutex){+.+.}-{4:4}, > at: amdgpu_devcoredump_format+0x1594/0x23f0 [amdgpu] > > but task is already holding lock: > ffff8882f82681f0 (reservation_ww_class_mutex){+.+.}-{4:4}, > at: amdgpu_devcoredump_format+0x1594/0x23f0 [amdgpu] > > Possible unsafe locking scenario: > CPU0 > ---- > lock(reservation_ww_class_mutex); > lock(reservation_ww_class_mutex); > > *** DEADLOCK *** > May be due to missing lock nesting notation > > Workqueue: events_unbound amdgpu_devcoredump_deferred_work [amdgpu] > Call Trace: > __ww_mutex_lock.constprop.0 > ww_mutex_lock > amdgpu_bo_reserve > amdgpu_devcoredump_format+0x1594 [amdgpu] > amdgpu_devcoredump_deferred_work+0xea [amdgpu] > process_one_work > worker_thread > kthread >
Friendly ping. Pierre-Eric, Christian, Alex — any thoughts on this fix? Happy to spin a v2 with any review feedback. One thing I'm aware of: the `Cc: [email protected] # 7.1` tag is probably unnecessary since the regression only landed in 7.1-rc1 and the fix will reach 7.1 final naturally via drm-fixes; I can drop it in v2 if preferred. -- Best Regards, Mike Gavrilov.
