drm_sched_entity_flush() may kill the VM entities under certain condition. then KFD need to issue kfd_process_wq_release to release associated resources, it cam cause following job submissions of process failed.
[ 3976.788183] [drm:amddrm_sched_entity_push_job [amd_sched]] *ERROR* Trying to push to a killed entity Or [ 129.600916] [drm:amdgpu_job_submit [amdgpu]] *ERROR* Trying to push to a killed entity Signed-off-by: Gang Ba <[email protected]> --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index bebf2ebc4f34..2361c09ddc77 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -2997,6 +2997,9 @@ static int amdgpu_flush(struct file *f, fl_owner_t id) struct amdgpu_fpriv *fpriv = file_priv->driver_priv; long timeout = MAX_WAIT_SCHED_ENTITY_Q_EMPTY; + if (fpriv->vm.is_compute_context) + return 0; + timeout = amdgpu_ctx_mgr_entity_flush(&fpriv->ctx_mgr, timeout); timeout = amdgpu_vm_wait_idle(&fpriv->vm, timeout); -- 2.34.1
