Hi Alex, On Thu, May 21, 2026 at 08:20:06PM -0400, Alex Deucher wrote: > There are certain corner cases where a queue reset is not > able to recover a hung queue. A pipe reset can recover > some of those cases, however, when the pipe is reset > all queues on that pipe are reset. This requires > coordination across all components using compute queues. > There is quite a bit of prep work in this series, some > of which I sent out previously. Another prerequisite > for this was reworking the userq reset path. It should > be more straight-forward now. The final patch also > needs to be updated once the new MES firmware is relased so > we can check the proper firmware versions. Using older > MES firmware may fail and end up in an adapter reset in some > cases where the pipe reset would have worked so it should > be comparable to the current behavior. >
Do you have a branch somewhere I can test with? Or what's the base commit of this patchset? Thanks! Regards, Boqun > Alex Deucher (34): > drm/amdkfd: always resume_all after suspend_all > drm/amdgpu: don't reemit if there is nothing to reemit > drm/amdgpu: track guilty fence for queue reset > drm/amdgpu/fence: add helper to extract the guilty fence > drm/amdgpu: amdgpu_ring_set_fence_errors_and_reemit() handle NULL > fence > drm/amdgpu/vcn: handle pipe reset more gracefully > drm/amdgpu/sdma: handle pipe reset more gracefully > drm/amdgpu/mes12: use proper grbm_select function > drm/amdgpu/gfx11: only need to remap KCQs when reset via MMIO > drm/amdgpu/gfx12: only need to remap KCQs when reset via MMIO > drm/amdgpu/mes11: move pipe reset to mes use_mmio patch > drm/amdgpu/mes12: move pipe reset to mes use_mmio patch > drm/amdgpu/mes: add userq reset helper > drm/amdgpu/mes: add a MMIO queue reset helper > drm/amdgpu/userq: split the queue reset from adapter reset > drm/amdgpu/userq: add per queue reset callback > drm/amdgpu/userq: add mes userq reset callback > drm/amdgpu/userq: switch to per queue reset > drm/amdgpu/userq: drop detect_and_reset callback > drm/amdkfd: rework MES queue reset sequence > drm/amdgpu/gfx: add a helper for MQD restore > drm/amdgpu/gfx11: use the new MQD helper for queue reset > drm/amdgpu/gfx12: use the new MQD helper for queue reset > drm/amdgpu/gfx11: unmap the queue via MES on reset for MMIO path > drm/amdgpu/gfx12: unmap the queue via MES on reset for MMIO path > drm/amdgpu: store whether to use MMIO or MES for reset > drm/amdgpu: Use a common KGQ and KCQ reset helper for gfx11/12 > drm/amdkfd: split out mes queue reset sequence into standalone > function > drm/amdkfd: plumb a helper to reset a KFD user queue > drm/amdgpu/userq: add MES userq reset helper > drm/amdgpu/gfx: add a common helper to handle MES compute resets > drm/amdgpu: use a single entry point for mes compute reset > drm/amdgpu/mes11: enable compute MMIO pipe reset > drm/amdgpu/mes12: enable compute MMIO pipe reset > > Amber Lin (3): > drm/amdgpu: Allocate enough space for hpd info on gfx11 > drm/amdkfd: Update queue reset support on KFD topology > drm/amdgpu: Expand MES queue/pipe reset support > > Jesse Zhang (4): > drm/amdgpu/mes_v12_0: use mes schedule pipe for legacy queues on > unified MES > drm/amdgpu/mes_v12_1: use mes schedule pipe for legacy queues on > unified MES > drm/amdgpu/gfx11: Refactor compute pipe reset and add HQD cleanup > drm/amdgpu/gfx12: Refactor compute pipe reset and add HQD cleanup > > Shaoyun Liu (1): > drm/amd/amdgpu/include : update mes api header v11/v12 > > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 14 + > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 16 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 54 +++- > drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 193 +++++++++++++ > drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 16 ++ > drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 67 ++++- > drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 14 + > drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 3 + > drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 19 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 84 +++--- > drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h | 3 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 64 +++-- > drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 264 +----------------- > drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 216 +------------- > drivers/gpu/drm/amd/amdgpu/gfx_v12_1.c | 2 + > drivers/gpu/drm/amd/amdgpu/mes_userqueue.c | 111 ++++---- > drivers/gpu/drm/amd/amdgpu/mes_userqueue.h | 9 + > drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 250 ++++++++++++++++- > drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 263 +++++++++++++++-- > drivers/gpu/drm/amd/amdgpu/mes_v12_1.c | 22 +- > drivers/gpu/drm/amd/amdkfd/kfd_device.c | 24 ++ > .../drm/amd/amdkfd/kfd_device_queue_manager.c | 135 ++++----- > .../drm/amd/amdkfd/kfd_device_queue_manager.h | 2 + > drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 3 +- > drivers/gpu/drm/amd/include/mes_v11_api_def.h | 5 +- > drivers/gpu/drm/amd/include/mes_v12_api_def.h | 5 +- > 26 files changed, 1150 insertions(+), 708 deletions(-) > > -- > 2.54.0 >
