On 10/17/25 16:04, Gang Ba wrote: > drm_sched_entity_flush() may kill the VM entities under certain condition. > then KFD need to issue kfd_process_wq_release to release associated > resources, it cam cause following job submissions of process failed. > > [ 3976.788183] [drm:amddrm_sched_entity_push_job [amd_sched]] *ERROR* Trying > to push to a killed entity > Or > [ 129.600916] [drm:amdgpu_job_submit [amdgpu]] *ERROR* Trying to push to a > killed entity
Clear NAK. When the process is killed the KFD should not try to submit any VM updates any more. Regards, Christian. > > Signed-off-by: Gang Ba <[email protected]> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > index bebf2ebc4f34..2361c09ddc77 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > @@ -2997,6 +2997,9 @@ static int amdgpu_flush(struct file *f, fl_owner_t id) > struct amdgpu_fpriv *fpriv = file_priv->driver_priv; > long timeout = MAX_WAIT_SCHED_ENTITY_Q_EMPTY; > > + if (fpriv->vm.is_compute_context) > + return 0; > + > timeout = amdgpu_ctx_mgr_entity_flush(&fpriv->ctx_mgr, timeout); > timeout = amdgpu_vm_wait_idle(&fpriv->vm, timeout); >
