There are certain corner cases where a queue reset is not
able to recover a hung queue. A pipe reset can recover
some of those cases, however, when the pipe is reset
all queues on that pipe are reset. This requires
coordination across all components using compute queues.
There is quite a bit of prep work in this series, some
of which I sent out previously. Another prerequisite
for this was reworking the userq reset path. It should
be more straight-forward now. The final patch also
needs to be updated once the new MES firmware is relased so
we can check the proper firmware versions. Using older
MES firmware may fail and end up in an adapter reset in some
cases where the pipe reset would have worked so it should
be comparable to the current behavior.
Alex Deucher (34):
drm/amdkfd: always resume_all after suspend_all
drm/amdgpu: don't reemit if there is nothing to reemit
drm/amdgpu: track guilty fence for queue reset
drm/amdgpu/fence: add helper to extract the guilty fence
drm/amdgpu: amdgpu_ring_set_fence_errors_and_reemit() handle NULL
fence
drm/amdgpu/vcn: handle pipe reset more gracefully
drm/amdgpu/sdma: handle pipe reset more gracefully
drm/amdgpu/mes12: use proper grbm_select function
drm/amdgpu/gfx11: only need to remap KCQs when reset via MMIO
drm/amdgpu/gfx12: only need to remap KCQs when reset via MMIO
drm/amdgpu/mes11: move pipe reset to mes use_mmio patch
drm/amdgpu/mes12: move pipe reset to mes use_mmio patch
drm/amdgpu/mes: add userq reset helper
drm/amdgpu/mes: add a MMIO queue reset helper
drm/amdgpu/userq: split the queue reset from adapter reset
drm/amdgpu/userq: add per queue reset callback
drm/amdgpu/userq: add mes userq reset callback
drm/amdgpu/userq: switch to per queue reset
drm/amdgpu/userq: drop detect_and_reset callback
drm/amdkfd: rework MES queue reset sequence
drm/amdgpu/gfx: add a helper for MQD restore
drm/amdgpu/gfx11: use the new MQD helper for queue reset
drm/amdgpu/gfx12: use the new MQD helper for queue reset
drm/amdgpu/gfx11: unmap the queue via MES on reset for MMIO path
drm/amdgpu/gfx12: unmap the queue via MES on reset for MMIO path
drm/amdgpu: store whether to use MMIO or MES for reset
drm/amdgpu: Use a common KGQ and KCQ reset helper for gfx11/12
drm/amdkfd: split out mes queue reset sequence into standalone
function
drm/amdkfd: plumb a helper to reset a KFD user queue
drm/amdgpu/userq: add MES userq reset helper
drm/amdgpu/gfx: add a common helper to handle MES compute resets
drm/amdgpu: use a single entry point for mes compute reset
drm/amdgpu/mes11: enable compute MMIO pipe reset
drm/amdgpu/mes12: enable compute MMIO pipe reset
Amber Lin (3):
drm/amdgpu: Allocate enough space for hpd info on gfx11
drm/amdkfd: Update queue reset support on KFD topology
drm/amdgpu: Expand MES queue/pipe reset support
Jesse Zhang (4):
drm/amdgpu/mes_v12_0: use mes schedule pipe for legacy queues on
unified MES
drm/amdgpu/mes_v12_1: use mes schedule pipe for legacy queues on
unified MES
drm/amdgpu/gfx11: Refactor compute pipe reset and add HQD cleanup
drm/amdgpu/gfx12: Refactor compute pipe reset and add HQD cleanup
Shaoyun Liu (1):
drm/amd/amdgpu/include : update mes api header v11/v12
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 14 +
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 16 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 54 +++-
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 193 +++++++++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 16 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 67 ++++-
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 14 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 3 +
drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 19 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 84 +++---
drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h | 3 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 64 +++--
drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 264 +-----------------
drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 216 +-------------
drivers/gpu/drm/amd/amdgpu/gfx_v12_1.c | 2 +
drivers/gpu/drm/amd/amdgpu/mes_userqueue.c | 111 ++++----
drivers/gpu/drm/amd/amdgpu/mes_userqueue.h | 9 +
drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 250 ++++++++++++++++-
drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 263 +++++++++++++++--
drivers/gpu/drm/amd/amdgpu/mes_v12_1.c | 22 +-
drivers/gpu/drm/amd/amdkfd/kfd_device.c | 24 ++
.../drm/amd/amdkfd/kfd_device_queue_manager.c | 135 ++++-----
.../drm/amd/amdkfd/kfd_device_queue_manager.h | 2 +
drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 3 +-
drivers/gpu/drm/amd/include/mes_v11_api_def.h | 5 +-
drivers/gpu/drm/amd/include/mes_v12_api_def.h | 5 +-
26 files changed, 1150 insertions(+), 708 deletions(-)
--
2.54.0