This set improves per queue reset support for GC10+. When we reset the queue, the queue is lost so we need to re-emit the unprocessed state from subsequent submissions. To that end, in order to make sure we actually restore unprocessed state, we need to enable legacy enforce isolation so that we can safely re-emit the unprocessed state. If we don't multiple jobs can run in parallel and we may not end up resetting the correct one. This is similar to how windows handles queues. This also gives us correct guilty tracking for GC.
Tested on GC 10 and 11 chips with a game running and then running hang tests. The game pauses when the hang happens, then continues after the queue reset. I tried this same approach and GC8 and 9, but it was not as reliable as soft recovery. As such, I've dropped the KGQ reset code for pre-GC10. The same approach can be extended to SDMA and VCN in the future. They don't need enforce isolation because those engines are single threaded so they always operate serially. Alex Deucher (18): drm/amdgpu/gfx10: enable legacy enforce isolation drm/amdgpu/gfx11: enable legacy enforce isolation drm/amdgpu/gfx12: enable legacy enforce isolation drm/amdgpu/gfx7: drop reset_kgq drm/amdgpu/gfx8: drop reset_kgq drm/amdgpu/gfx9: drop reset_kgq drm/amdgpu: add AMDGPU_QUEUE_RESET_TIMEOUT drm/amdgpu/ring: add helper for padding the ring drm/amdgpu: pad ring in amdgpu_ib_schedule drm/amdgpu: track ring state associated with a job drm/amdgpu/gfx10: re-emit unprocessed state on kgq reset drm/amdgpu/gfx11: re-emit unprocessed state on kgq reset drm/amdgpu/gfx12: re-emit unprocessed state on kgq reset drm/amdgpu/gfx9: re-emit unprocessed state on kcq reset drm/amdgpu/gfx9.4.3: re-emit unprocessed state on kcq reset drm/amdgpu/gfx10: re-emit unprocessed state on kcq reset drm/amdgpu/gfx11: re-emit unprocessed state on kcq reset drm/amdgpu/gfx12: re-emit unprocessed state on kcq reset Christian König (1): drm/amdgpu: rework queue reset scheduler interaction drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 8 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 32 ++++++----- drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 52 ++++++++++++++--- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 6 ++ drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 57 ++++++++++--------- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 48 +++++++++------- drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 48 +++++++++------- drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 71 ------------------------ drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 71 ------------------------ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 59 ++++---------------- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 15 ++++- 13 files changed, 192 insertions(+), 278 deletions(-) -- 2.49.0