Hi Alex,

On Thu, May 21, 2026 at 08:20:06PM -0400, Alex Deucher wrote:
> There are certain corner cases where a queue reset is not
> able to recover a hung queue.  A pipe reset can recover
> some of those cases, however, when the pipe is reset
> all queues on that pipe are reset.  This requires
> coordination across all components using compute queues.
> There is quite a bit of prep work in this series, some
> of which I sent out previously. Another prerequisite
> for this was reworking the userq reset path.  It should
> be more straight-forward now.  The final patch also
> needs to be updated once the new MES firmware is relased so
> we can check the proper firmware versions. Using older
> MES firmware may fail and end up in an adapter reset in some
> cases where the pipe reset would have worked so it should
> be comparable to the current behavior.
> 

Do you have a branch somewhere I can test with? Or what's the base
commit of this patchset? Thanks!

Regards,
Boqun

> Alex Deucher (34):
>   drm/amdkfd: always resume_all after suspend_all
>   drm/amdgpu: don't reemit if there is nothing to reemit
>   drm/amdgpu: track guilty fence for queue reset
>   drm/amdgpu/fence: add helper to extract the guilty fence
>   drm/amdgpu: amdgpu_ring_set_fence_errors_and_reemit() handle NULL
>     fence
>   drm/amdgpu/vcn: handle pipe reset more gracefully
>   drm/amdgpu/sdma: handle pipe reset more gracefully
>   drm/amdgpu/mes12: use proper grbm_select function
>   drm/amdgpu/gfx11: only need to remap KCQs when reset via MMIO
>   drm/amdgpu/gfx12: only need to remap KCQs when reset via MMIO
>   drm/amdgpu/mes11: move pipe reset to mes use_mmio patch
>   drm/amdgpu/mes12: move pipe reset to mes use_mmio patch
>   drm/amdgpu/mes: add userq reset helper
>   drm/amdgpu/mes: add a MMIO queue reset helper
>   drm/amdgpu/userq: split the queue reset from adapter reset
>   drm/amdgpu/userq: add per queue reset callback
>   drm/amdgpu/userq: add mes userq reset callback
>   drm/amdgpu/userq: switch to per queue reset
>   drm/amdgpu/userq: drop detect_and_reset callback
>   drm/amdkfd: rework MES queue reset sequence
>   drm/amdgpu/gfx: add a helper for MQD restore
>   drm/amdgpu/gfx11: use the new MQD helper for queue reset
>   drm/amdgpu/gfx12: use the new MQD helper for queue reset
>   drm/amdgpu/gfx11: unmap the queue via MES on reset for MMIO path
>   drm/amdgpu/gfx12: unmap the queue via MES on reset for MMIO path
>   drm/amdgpu: store whether to use MMIO or MES for reset
>   drm/amdgpu: Use a common KGQ and KCQ reset helper for gfx11/12
>   drm/amdkfd: split out mes queue reset sequence into standalone
>     function
>   drm/amdkfd: plumb a helper to reset a KFD user queue
>   drm/amdgpu/userq: add MES userq reset helper
>   drm/amdgpu/gfx: add a common helper to handle MES compute resets
>   drm/amdgpu: use a single entry point for mes compute reset
>   drm/amdgpu/mes11: enable compute MMIO pipe reset
>   drm/amdgpu/mes12: enable compute MMIO pipe reset
> 
> Amber Lin (3):
>   drm/amdgpu: Allocate enough space for hpd info on gfx11
>   drm/amdkfd: Update queue reset support on KFD topology
>   drm/amdgpu: Expand MES queue/pipe reset support
> 
> Jesse Zhang (4):
>   drm/amdgpu/mes_v12_0: use mes schedule pipe for legacy queues on
>     unified MES
>   drm/amdgpu/mes_v12_1: use mes schedule pipe for legacy queues on
>     unified MES
>   drm/amdgpu/gfx11: Refactor compute pipe reset and add HQD cleanup
>   drm/amdgpu/gfx12: Refactor compute pipe reset and add HQD cleanup
> 
> Shaoyun Liu (1):
>   drm/amd/amdgpu/include : update mes api header v11/v12
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c    |  14 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |  16 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c     |  54 +++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c       | 193 +++++++++++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h       |  16 ++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c       |  67 ++++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h       |  14 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h      |   3 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c      |  19 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c     |  84 +++---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h     |   3 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c       |  64 +++--
>  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        | 264 +-----------------
>  drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c        | 216 +-------------
>  drivers/gpu/drm/amd/amdgpu/gfx_v12_1.c        |   2 +
>  drivers/gpu/drm/amd/amdgpu/mes_userqueue.c    | 111 ++++----
>  drivers/gpu/drm/amd/amdgpu/mes_userqueue.h    |   9 +
>  drivers/gpu/drm/amd/amdgpu/mes_v11_0.c        | 250 ++++++++++++++++-
>  drivers/gpu/drm/amd/amdgpu/mes_v12_0.c        | 263 +++++++++++++++--
>  drivers/gpu/drm/amd/amdgpu/mes_v12_1.c        |  22 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_device.c       |  24 ++
>  .../drm/amd/amdkfd/kfd_device_queue_manager.c | 135 ++++-----
>  .../drm/amd/amdkfd/kfd_device_queue_manager.h |   2 +
>  drivers/gpu/drm/amd/amdkfd/kfd_topology.c     |   3 +-
>  drivers/gpu/drm/amd/include/mes_v11_api_def.h |   5 +-
>  drivers/gpu/drm/amd/include/mes_v12_api_def.h |   5 +-
>  26 files changed, 1150 insertions(+), 708 deletions(-)
> 
> -- 
> 2.54.0
> 

Reply via email to