On Tue, May 26, 2026 at 4:29 PM Boqun Feng <[email protected]> wrote: > > Hi Alex, > > On Thu, May 21, 2026 at 08:20:06PM -0400, Alex Deucher wrote: > > There are certain corner cases where a queue reset is not > > able to recover a hung queue. A pipe reset can recover > > some of those cases, however, when the pipe is reset > > all queues on that pipe are reset. This requires > > coordination across all components using compute queues. > > There is quite a bit of prep work in this series, some > > of which I sent out previously. Another prerequisite > > for this was reworking the userq reset path. It should > > be more straight-forward now. The final patch also > > needs to be updated once the new MES firmware is relased so > > we can check the proper firmware versions. Using older > > MES firmware may fail and end up in an adapter reset in some > > cases where the pipe reset would have worked so it should > > be comparable to the current behavior. > > > > Do you have a branch somewhere I can test with? Or what's the base > commit of this patchset? Thanks!
I've pushed it here: https://gitlab.freedesktop.org/agd5f/linux/-/commits/pipe_reset?ref_type=heads Alex > > Regards, > Boqun > > > Alex Deucher (34): > > drm/amdkfd: always resume_all after suspend_all > > drm/amdgpu: don't reemit if there is nothing to reemit > > drm/amdgpu: track guilty fence for queue reset > > drm/amdgpu/fence: add helper to extract the guilty fence > > drm/amdgpu: amdgpu_ring_set_fence_errors_and_reemit() handle NULL > > fence > > drm/amdgpu/vcn: handle pipe reset more gracefully > > drm/amdgpu/sdma: handle pipe reset more gracefully > > drm/amdgpu/mes12: use proper grbm_select function > > drm/amdgpu/gfx11: only need to remap KCQs when reset via MMIO > > drm/amdgpu/gfx12: only need to remap KCQs when reset via MMIO > > drm/amdgpu/mes11: move pipe reset to mes use_mmio patch > > drm/amdgpu/mes12: move pipe reset to mes use_mmio patch > > drm/amdgpu/mes: add userq reset helper > > drm/amdgpu/mes: add a MMIO queue reset helper > > drm/amdgpu/userq: split the queue reset from adapter reset > > drm/amdgpu/userq: add per queue reset callback > > drm/amdgpu/userq: add mes userq reset callback > > drm/amdgpu/userq: switch to per queue reset > > drm/amdgpu/userq: drop detect_and_reset callback > > drm/amdkfd: rework MES queue reset sequence > > drm/amdgpu/gfx: add a helper for MQD restore > > drm/amdgpu/gfx11: use the new MQD helper for queue reset > > drm/amdgpu/gfx12: use the new MQD helper for queue reset > > drm/amdgpu/gfx11: unmap the queue via MES on reset for MMIO path > > drm/amdgpu/gfx12: unmap the queue via MES on reset for MMIO path > > drm/amdgpu: store whether to use MMIO or MES for reset > > drm/amdgpu: Use a common KGQ and KCQ reset helper for gfx11/12 > > drm/amdkfd: split out mes queue reset sequence into standalone > > function > > drm/amdkfd: plumb a helper to reset a KFD user queue > > drm/amdgpu/userq: add MES userq reset helper > > drm/amdgpu/gfx: add a common helper to handle MES compute resets > > drm/amdgpu: use a single entry point for mes compute reset > > drm/amdgpu/mes11: enable compute MMIO pipe reset > > drm/amdgpu/mes12: enable compute MMIO pipe reset > > > > Amber Lin (3): > > drm/amdgpu: Allocate enough space for hpd info on gfx11 > > drm/amdkfd: Update queue reset support on KFD topology > > drm/amdgpu: Expand MES queue/pipe reset support > > > > Jesse Zhang (4): > > drm/amdgpu/mes_v12_0: use mes schedule pipe for legacy queues on > > unified MES > > drm/amdgpu/mes_v12_1: use mes schedule pipe for legacy queues on > > unified MES > > drm/amdgpu/gfx11: Refactor compute pipe reset and add HQD cleanup > > drm/amdgpu/gfx12: Refactor compute pipe reset and add HQD cleanup > > > > Shaoyun Liu (1): > > drm/amd/amdgpu/include : update mes api header v11/v12 > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 14 + > > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 16 +- > > drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 54 +++- > > drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 193 +++++++++++++ > > drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 16 ++ > > drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 67 ++++- > > drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 14 + > > drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 3 + > > drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 19 +- > > drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 84 +++--- > > drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h | 3 +- > > drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 64 +++-- > > drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 264 +----------------- > > drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 216 +------------- > > drivers/gpu/drm/amd/amdgpu/gfx_v12_1.c | 2 + > > drivers/gpu/drm/amd/amdgpu/mes_userqueue.c | 111 ++++---- > > drivers/gpu/drm/amd/amdgpu/mes_userqueue.h | 9 + > > drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 250 ++++++++++++++++- > > drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 263 +++++++++++++++-- > > drivers/gpu/drm/amd/amdgpu/mes_v12_1.c | 22 +- > > drivers/gpu/drm/amd/amdkfd/kfd_device.c | 24 ++ > > .../drm/amd/amdkfd/kfd_device_queue_manager.c | 135 ++++----- > > .../drm/amd/amdkfd/kfd_device_queue_manager.h | 2 + > > drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 3 +- > > drivers/gpu/drm/amd/include/mes_v11_api_def.h | 5 +- > > drivers/gpu/drm/amd/include/mes_v12_api_def.h | 5 +- > > 26 files changed, 1150 insertions(+), 708 deletions(-) > > > > -- > > 2.54.0 > >
