[Public]

We might want to add a TODO tag around the TLB fence creation to track a 
follow-up check from the KIQ/MES side.

With it or not, the patch is
Reviewed-by: Prike Liang <[email protected]>

Regards,
      Prike

> -----Original Message-----
> From: Alex Deucher <[email protected]>
> Sent: Monday, March 16, 2026 11:17 PM
> To: [email protected]
> Cc: Deucher, Alexander <[email protected]>; Koenig, Christian
> <[email protected]>; Liang, Prike <[email protected]>
> Subject: [PATCH] drm/amdgpu: rework how we handle TLB fences
>
> Add a new VM flag to indicate whether or not we need a TLB fence.  Userqs 
> (KFD or
> KGD) require a TLB fence.
> A TLB fence is not strictly required for kernel queues, but it shouldn't 
> hurt.  That said,
> enabling this unconditionally should be fine, but it seems to tickle some 
> issues in
> KIQ/MES.  Only enable them for KFD, or when KGD userq queues are enabled
> (currently via module parameter).
>
> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4798
> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4749
> Fixes: f3854e04b708 ("drm/amdgpu: attach tlb fence to the PTs update")
> Cc: Christian König <[email protected]>
> Cc: Prike Liang <[email protected]>
> Signed-off-by: Alex Deucher <[email protected]>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 +++-
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 2 ++
>  2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index b89013a6aa0b6..497464f50ea7d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -1041,7 +1041,7 @@ amdgpu_vm_tlb_flush(struct
> amdgpu_vm_update_params *params,
>       }
>
>       /* Prepare a TLB flush fence to be attached to PTs */
> -     if (!params->unlocked) {
> +     if (!params->unlocked && vm->need_tlb_fence) {
>               amdgpu_vm_tlb_fence_create(params->adev, vm, fence);
>
>               /* Makes sure no PD/PT is freed before the flush */ @@ -2573,6
> +2573,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm
> *vm,
>       ttm_lru_bulk_move_init(&vm->lru_bulk_move);
>
>       vm->is_compute_context = false;
> +     vm->need_tlb_fence = amdgpu_userq_enabled(&adev->ddev);
>
>       vm->use_cpu_for_update = !!(adev->vm_manager.vm_update_mode &
>                                   AMDGPU_VM_USE_CPU_FOR_GFX);
> @@ -2710,6 +2711,7 @@ int amdgpu_vm_make_compute(struct amdgpu_device
> *adev, struct amdgpu_vm *vm)
>       dma_fence_put(vm->last_update);
>       vm->last_update = dma_fence_get_stub();
>       vm->is_compute_context = true;
> +     vm->need_tlb_fence = true;
>
>  unreserve_bo:
>       amdgpu_bo_unreserve(vm->root.bo);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> index ae9449d5b00cd..25d176d1350ef 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> @@ -444,6 +444,8 @@ struct amdgpu_vm {
>       struct ttm_lru_bulk_move lru_bulk_move;
>       /* Flag to indicate if VM is used for compute */
>       bool                    is_compute_context;
> +     /* Flag to indicate if VM needs a TLB fence (KFD or KGD) */
> +     bool                    need_tlb_fence;
>
>       /* Memory partition number, -1 means any partition */
>       int8_t                  mem_id;
> --
> 2.53.0

Reply via email to