On 6/6/25 09:51, Liang, Prike wrote: > [Public] > >> From: Alex Deucher <alexdeuc...@gmail.com> >> Sent: Saturday, May 31, 2025 5:54 AM >> To: Liang, Prike <prike.li...@amd.com> >> Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander >> <alexander.deuc...@amd.com>; Koenig, Christian <christian.koe...@amd.com>; >> Lazar, Lijo <lijo.la...@amd.com> >> Subject: Re: [PATCH 9/9] drm/amdgpu: validate userq activity status for >> GEM_VA >> unmap >> >> On Fri, May 30, 2025 at 4:05 AM Prike Liang <prike.li...@amd.com> wrote: >>> >>> The userq VA unmap requires validating queue status before unamapping >>> it, if user tries to unmap an active userq by GEM VA IOCTL then the >>> driver should report an error for this illegal usage. >>> >>> Signed-off-by: Prike Liang <prike.li...@amd.com> >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 16 +++++++++++++--- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 5 +++++ >>> 2 files changed, 18 insertions(+), 3 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c >>> index e43a61f64755..e2275280554d 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c >>> @@ -234,7 +234,7 @@ amdgpu_userq_map_helper(struct amdgpu_userq_mgr >> *uq_mgr, >>> return r; >>> } >>> >>> -static void >>> +static int >>> amdgpu_userq_wait_for_last_fence(struct amdgpu_userq_mgr *uq_mgr, >>> struct amdgpu_usermode_queue *queue) >>> { @@ -243,10 +243,14 @@ amdgpu_userq_wait_for_last_fence(struct >>> amdgpu_userq_mgr *uq_mgr, >>> >>> if (f && !dma_fence_is_signaled(f)) { >>> ret = dma_fence_wait_timeout(f, true, >>> msecs_to_jiffies(100)); >>> - if (ret <= 0) >>> + if (ret <= 0) { >>> drm_file_err(uq_mgr->file, "Timed out waiting for >> fence=%llu:%llu\n", >>> f->context, f->seqno); >>> + return -ETIMEDOUT; >>> + } >>> } >>> + >>> + return 0; >>> } >>> >>> static void >>> @@ -464,7 +468,13 @@ amdgpu_userq_destroy(struct drm_file *filp, int >> queue_id) >>> mutex_unlock(&uq_mgr->userq_mutex); >>> return -EINVAL; >>> } >>> - amdgpu_userq_wait_for_last_fence(uq_mgr, queue); >>> + >>> + if (amdgpu_userq_wait_for_last_fence(uq_mgr, queue)) { >>> + drm_warn(adev_to_drm(uq_mgr->adev), "Don't destroy a busy >> userq\n"); >>> + mutex_unlock(&uq_mgr->userq_mutex); >>> + return -EINVAL; >>> + } >>> + >>> r = amdgpu_bo_reserve(queue->db_obj.obj, true); >>> if (!r) { >>> amdgpu_bo_unpin(queue->db_obj.obj); >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>> index 5e075e8f0ca3..168fc3835aaf 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>> @@ -1930,6 +1930,11 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device >> *adev, >>> struct amdgpu_vm *vm = bo_va->base.vm; >>> bool valid = true; >>> >>> + if (amdgpu_userq_gem_va_unmap_wait(vm)) { >>> + dev_warn(adev->dev, "shouldn't unmap the VA for an active >>> userq\n"); >>> + return -EINVAL; >>> + } >> >> There's not need to wait for anything. Just return an error if the userq >> has not been >> destroyed yet. > [Prike] The current userq destroy IOCTL request always comes after > amdgpu_vm_bo_unmap(), > the original destroy validation idea is that the userq BOs only be able to be > unmapped when they are > idle, so here do we need to do the BOs busy wait or just test and throw a > warning/error message when > the driver tries unmap a busy BO?
Yeah that discussion came up previously before. As far as I can see we can't return an error from unmap because the BO might still be used by a queue. At least on the gfx side such a behavior could easily lead to crashes because userspace doesn't expect that an unmap fails and tries to re-use the address range. What we can do is deactivate that queue and prevent it from becomming active again. Regards, Christian. > >> Alex >> >>> + >>> saddr /= AMDGPU_GPU_PAGE_SIZE; >>> >>> list_for_each_entry(mapping, &bo_va->valids, list) { >>> -- >>> 2.34.1 >>>