On Tue, May 26, 2026 at 4:29 PM Boqun Feng <[email protected]> wrote:
>
> Hi Alex,
>
> On Thu, May 21, 2026 at 08:20:06PM -0400, Alex Deucher wrote:
> > There are certain corner cases where a queue reset is not
> > able to recover a hung queue.  A pipe reset can recover
> > some of those cases, however, when the pipe is reset
> > all queues on that pipe are reset.  This requires
> > coordination across all components using compute queues.
> > There is quite a bit of prep work in this series, some
> > of which I sent out previously. Another prerequisite
> > for this was reworking the userq reset path.  It should
> > be more straight-forward now.  The final patch also
> > needs to be updated once the new MES firmware is relased so
> > we can check the proper firmware versions. Using older
> > MES firmware may fail and end up in an adapter reset in some
> > cases where the pipe reset would have worked so it should
> > be comparable to the current behavior.
> >
>
> Do you have a branch somewhere I can test with? Or what's the base
> commit of this patchset? Thanks!

I've pushed it here:
https://gitlab.freedesktop.org/agd5f/linux/-/commits/pipe_reset?ref_type=heads

Alex

>
> Regards,
> Boqun
>
> > Alex Deucher (34):
> >   drm/amdkfd: always resume_all after suspend_all
> >   drm/amdgpu: don't reemit if there is nothing to reemit
> >   drm/amdgpu: track guilty fence for queue reset
> >   drm/amdgpu/fence: add helper to extract the guilty fence
> >   drm/amdgpu: amdgpu_ring_set_fence_errors_and_reemit() handle NULL
> >     fence
> >   drm/amdgpu/vcn: handle pipe reset more gracefully
> >   drm/amdgpu/sdma: handle pipe reset more gracefully
> >   drm/amdgpu/mes12: use proper grbm_select function
> >   drm/amdgpu/gfx11: only need to remap KCQs when reset via MMIO
> >   drm/amdgpu/gfx12: only need to remap KCQs when reset via MMIO
> >   drm/amdgpu/mes11: move pipe reset to mes use_mmio patch
> >   drm/amdgpu/mes12: move pipe reset to mes use_mmio patch
> >   drm/amdgpu/mes: add userq reset helper
> >   drm/amdgpu/mes: add a MMIO queue reset helper
> >   drm/amdgpu/userq: split the queue reset from adapter reset
> >   drm/amdgpu/userq: add per queue reset callback
> >   drm/amdgpu/userq: add mes userq reset callback
> >   drm/amdgpu/userq: switch to per queue reset
> >   drm/amdgpu/userq: drop detect_and_reset callback
> >   drm/amdkfd: rework MES queue reset sequence
> >   drm/amdgpu/gfx: add a helper for MQD restore
> >   drm/amdgpu/gfx11: use the new MQD helper for queue reset
> >   drm/amdgpu/gfx12: use the new MQD helper for queue reset
> >   drm/amdgpu/gfx11: unmap the queue via MES on reset for MMIO path
> >   drm/amdgpu/gfx12: unmap the queue via MES on reset for MMIO path
> >   drm/amdgpu: store whether to use MMIO or MES for reset
> >   drm/amdgpu: Use a common KGQ and KCQ reset helper for gfx11/12
> >   drm/amdkfd: split out mes queue reset sequence into standalone
> >     function
> >   drm/amdkfd: plumb a helper to reset a KFD user queue
> >   drm/amdgpu/userq: add MES userq reset helper
> >   drm/amdgpu/gfx: add a common helper to handle MES compute resets
> >   drm/amdgpu: use a single entry point for mes compute reset
> >   drm/amdgpu/mes11: enable compute MMIO pipe reset
> >   drm/amdgpu/mes12: enable compute MMIO pipe reset
> >
> > Amber Lin (3):
> >   drm/amdgpu: Allocate enough space for hpd info on gfx11
> >   drm/amdkfd: Update queue reset support on KFD topology
> >   drm/amdgpu: Expand MES queue/pipe reset support
> >
> > Jesse Zhang (4):
> >   drm/amdgpu/mes_v12_0: use mes schedule pipe for legacy queues on
> >     unified MES
> >   drm/amdgpu/mes_v12_1: use mes schedule pipe for legacy queues on
> >     unified MES
> >   drm/amdgpu/gfx11: Refactor compute pipe reset and add HQD cleanup
> >   drm/amdgpu/gfx12: Refactor compute pipe reset and add HQD cleanup
> >
> > Shaoyun Liu (1):
> >   drm/amd/amdgpu/include : update mes api header v11/v12
> >
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c    |  14 +
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |  16 +-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c     |  54 +++-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c       | 193 +++++++++++++
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h       |  16 ++
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c       |  67 ++++-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h       |  14 +
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h      |   3 +
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c      |  19 +-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c     |  84 +++---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h     |   3 +-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c       |  64 +++--
> >  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        | 264 +-----------------
> >  drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c        | 216 +-------------
> >  drivers/gpu/drm/amd/amdgpu/gfx_v12_1.c        |   2 +
> >  drivers/gpu/drm/amd/amdgpu/mes_userqueue.c    | 111 ++++----
> >  drivers/gpu/drm/amd/amdgpu/mes_userqueue.h    |   9 +
> >  drivers/gpu/drm/amd/amdgpu/mes_v11_0.c        | 250 ++++++++++++++++-
> >  drivers/gpu/drm/amd/amdgpu/mes_v12_0.c        | 263 +++++++++++++++--
> >  drivers/gpu/drm/amd/amdgpu/mes_v12_1.c        |  22 +-
> >  drivers/gpu/drm/amd/amdkfd/kfd_device.c       |  24 ++
> >  .../drm/amd/amdkfd/kfd_device_queue_manager.c | 135 ++++-----
> >  .../drm/amd/amdkfd/kfd_device_queue_manager.h |   2 +
> >  drivers/gpu/drm/amd/amdkfd/kfd_topology.c     |   3 +-
> >  drivers/gpu/drm/amd/include/mes_v11_api_def.h |   5 +-
> >  drivers/gpu/drm/amd/include/mes_v12_api_def.h |   5 +-
> >  26 files changed, 1150 insertions(+), 708 deletions(-)
> >
> > --
> > 2.54.0
> >

Reply via email to