sometime user space submits bad command steam to kernel and with current scheme
gpu-scheduler will always resubmit all un-signaled job to hw ring after gpu 
reset
thus this bad submit will infinitly trigger GPU hang.

this patch serials implement a system called guilty context, which can avoid 
submitting
malicious jobs and invalidate the related context behind them, that way the 
regular
application can still continue to run, and other VF can also suffer less GPU 
time reductions

the guilty charge is simple: if a job hang too much times exceeds the 
threshold, we
consider it guilty, and we invalidates the context behind it, and pop out all 
job in
its entities of each scheduler. the next IOCTL on this CTX handler will get 
-ENODEV
error thus UMD can know this context is released by driver due to its malicious 
command submit.

Monk Liu (5):
  drm/amdgpu:keep ctx alive till all job finished
  drm/amdgpu:some modifications in amdgpu_ctx
  drm/amdgpu:Impl guilty ctx feature for sriov TDR
  drm/amdgpu:change sriov_gpu_reset interface
  drm/amdgpu:sriov TDR only recover hang ring

 drivers/gpu/drm/amd/amdgpu/amdgpu.h           | 12 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c        | 26 ++++----
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c       | 39 ++++++++++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    | 43 ++++++++++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |  3 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c     |  6 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c       | 30 +++++++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h      |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h      |  2 +-
 drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c         |  2 +-
 drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c         |  2 +-
 drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 87 ++++++++++++++++++++++++---
 drivers/gpu/drm/amd/scheduler/gpu_scheduler.h |  3 +
 13 files changed, 209 insertions(+), 47 deletions(-)

-- 
2.7.4

_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to