On 2025-07-08 16:14, Gang Ba wrote:
If vm belongs to another process, this is fclose after fork,
wait may enable signaling KFD eviction fence and cause parent process queue
evicted.
Signed-off-by: Gang Ba <gang...@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index f042372d9f2e..8ee1b7e62dee 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2410,6 +2410,13 @@ void amdgpu_vm_adjust_size(struct amdgpu_device *adev,
uint32_t min_vm_size,
*/
long amdgpu_vm_wait_idle(struct amdgpu_vm *vm, long timeout)
{
+ /* If vm belongs to another process, this is fclose after fork,
+ * wait may enable signaling KFD eviction fence and cause parent
process queue evicted.
+ */
+ if (vm->task_info->tgid &&
+ vm->task_info->tgid != current->group_leader->pid)
+ return 0;
+
Only check this for KFD vm, in case this may cause gfx test regression.
if (vm->is_compute_context && vm->task_info->tgid !=
current->group_leader->pid)
return 0;
Regards,
Philip
timeout = dma_resv_wait_timeout(vm->root.bo->tbo.base.resv,
DMA_RESV_USAGE_BOOKKEEP,
true, timeout);