If mmap write lock is taken while draining retry fault, mmap write lock is not released because svm_range_restore_pages calls mmap_read_unlock then returns. This causes deadlock and systen hang later because mmap read or write lock cannot be taken.
Downgrade mmap write lock to read lock if draining retry fault fix this bug. Signed-off-by: Philip Yang <philip.y...@amd.com> --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 6604a37b304f..fb02ff9ae62a 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -3043,6 +3043,8 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid, if (svms->checkpoint_ts[gpuidx] != 0) { if (amdgpu_ih_ts_after_or_equal(ts, svms->checkpoint_ts[gpuidx])) { pr_debug("draining retry fault, drop fault 0x%llx\n", addr); + if (write_locked) + mmap_write_downgrade(mm); r = -EAGAIN; goto out_unlock_svms; } else { -- 2.49.0