There are certain corner cases where a queue reset is not
able to recover a hung queue.  A pipe reset can recover
some of those cases, however, when the pipe is reset
all queues on that pipe are reset.  This requires
coordination across all components using compute queues.
There is quite a bit of prep work in this series, some
of which I sent out previously. Another prerequisite
for this was reworking the userq reset path.  It should
be more straight-forward now.  The final patch also
needs to be updated once the new MES firmware is relased so
we can check the proper firmware versions. Using older
MES firmware may fail and end up in an adapter reset in some
cases where the pipe reset would have worked so it should
be comparable to the current behavior.

Alex Deucher (34):
  drm/amdkfd: always resume_all after suspend_all
  drm/amdgpu: don't reemit if there is nothing to reemit
  drm/amdgpu: track guilty fence for queue reset
  drm/amdgpu/fence: add helper to extract the guilty fence
  drm/amdgpu: amdgpu_ring_set_fence_errors_and_reemit() handle NULL
    fence
  drm/amdgpu/vcn: handle pipe reset more gracefully
  drm/amdgpu/sdma: handle pipe reset more gracefully
  drm/amdgpu/mes12: use proper grbm_select function
  drm/amdgpu/gfx11: only need to remap KCQs when reset via MMIO
  drm/amdgpu/gfx12: only need to remap KCQs when reset via MMIO
  drm/amdgpu/mes11: move pipe reset to mes use_mmio patch
  drm/amdgpu/mes12: move pipe reset to mes use_mmio patch
  drm/amdgpu/mes: add userq reset helper
  drm/amdgpu/mes: add a MMIO queue reset helper
  drm/amdgpu/userq: split the queue reset from adapter reset
  drm/amdgpu/userq: add per queue reset callback
  drm/amdgpu/userq: add mes userq reset callback
  drm/amdgpu/userq: switch to per queue reset
  drm/amdgpu/userq: drop detect_and_reset callback
  drm/amdkfd: rework MES queue reset sequence
  drm/amdgpu/gfx: add a helper for MQD restore
  drm/amdgpu/gfx11: use the new MQD helper for queue reset
  drm/amdgpu/gfx12: use the new MQD helper for queue reset
  drm/amdgpu/gfx11: unmap the queue via MES on reset for MMIO path
  drm/amdgpu/gfx12: unmap the queue via MES on reset for MMIO path
  drm/amdgpu: store whether to use MMIO or MES for reset
  drm/amdgpu: Use a common KGQ and KCQ reset helper for gfx11/12
  drm/amdkfd: split out mes queue reset sequence into standalone
    function
  drm/amdkfd: plumb a helper to reset a KFD user queue
  drm/amdgpu/userq: add MES userq reset helper
  drm/amdgpu/gfx: add a common helper to handle MES compute resets
  drm/amdgpu: use a single entry point for mes compute reset
  drm/amdgpu/mes11: enable compute MMIO pipe reset
  drm/amdgpu/mes12: enable compute MMIO pipe reset

Amber Lin (3):
  drm/amdgpu: Allocate enough space for hpd info on gfx11
  drm/amdkfd: Update queue reset support on KFD topology
  drm/amdgpu: Expand MES queue/pipe reset support

Jesse Zhang (4):
  drm/amdgpu/mes_v12_0: use mes schedule pipe for legacy queues on
    unified MES
  drm/amdgpu/mes_v12_1: use mes schedule pipe for legacy queues on
    unified MES
  drm/amdgpu/gfx11: Refactor compute pipe reset and add HQD cleanup
  drm/amdgpu/gfx12: Refactor compute pipe reset and add HQD cleanup

Shaoyun Liu (1):
  drm/amd/amdgpu/include : update mes api header v11/v12

 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c    |  14 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |  16 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c     |  54 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c       | 193 +++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h       |  16 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c       |  67 ++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h       |  14 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h      |   3 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c      |  19 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c     |  84 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h     |   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c       |  64 +++--
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        | 264 +-----------------
 drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c        | 216 +-------------
 drivers/gpu/drm/amd/amdgpu/gfx_v12_1.c        |   2 +
 drivers/gpu/drm/amd/amdgpu/mes_userqueue.c    | 111 ++++----
 drivers/gpu/drm/amd/amdgpu/mes_userqueue.h    |   9 +
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c        | 250 ++++++++++++++++-
 drivers/gpu/drm/amd/amdgpu/mes_v12_0.c        | 263 +++++++++++++++--
 drivers/gpu/drm/amd/amdgpu/mes_v12_1.c        |  22 +-
 drivers/gpu/drm/amd/amdkfd/kfd_device.c       |  24 ++
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 135 ++++-----
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |   2 +
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c     |   3 +-
 drivers/gpu/drm/amd/include/mes_v11_api_def.h |   5 +-
 drivers/gpu/drm/amd/include/mes_v12_api_def.h |   5 +-
 26 files changed, 1150 insertions(+), 708 deletions(-)

-- 
2.54.0

Reply via email to