On 12.08.25 10:00, Liu01 Tong wrote:
> During process kill, drm_sched_entity_flush() will kill the vm
> entities. The following job submissions of this process will fail, and
> the resources of these jobs have not been released, nor have the fences
> been signalled, causing tasks to hang and timeout.
> 
> Fix by check entity status in amdgpu_vm_ready() and avoid submit jobs to
> stopped entity.

Looks good to me, but to just be on the safe side please add another call to 
amdgpu_vm_ready() to the function amdgpu_cs_vm_handling().

Right before we start updating the VM, e.g. after the 
amdgpu_vmid_uses_reserved() check for the gang submission and before the call 
to amdgpu_vm_clear_freed().

Regards,
Christian.

> 
> Signed-off-by: Liu01 Tong <tong.li...@amd.com>
> Signed-off-by: Lin.Cao <linca...@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 283dd44f04b0..bf42246a3db2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -654,11 +654,10 @@ int amdgpu_vm_validate(struct amdgpu_device *adev, 
> struct amdgpu_vm *vm,
>   * Check if all VM PDs/PTs are ready for updates
>   *
>   * Returns:
> - * True if VM is not evicting.
> + * True if VM is not evicting and all VM entities are not stopped
>   */
>  bool amdgpu_vm_ready(struct amdgpu_vm *vm)
>  {
> -     bool empty;
>       bool ret;
>  
>       amdgpu_vm_eviction_lock(vm);
> @@ -666,10 +665,18 @@ bool amdgpu_vm_ready(struct amdgpu_vm *vm)
>       amdgpu_vm_eviction_unlock(vm);
>  
>       spin_lock(&vm->status_lock);
> -     empty = list_empty(&vm->evicted);
> +     ret &= list_empty(&vm->evicted);
>       spin_unlock(&vm->status_lock);
>  
> -     return ret && empty;
> +     spin_lock(&vm->immediate.lock);
> +     ret &= !vm->immediate.stopped;
> +     spin_unlock(&vm->immediate.lock);
> +
> +     spin_lock(&vm->delayed.lock);
> +     ret &= !vm->delayed.stopped;
> +     spin_unlock(&vm->delayed.lock);
> +
> +     return ret;
>  }
>  
>  /**

Reply via email to