Implement retry faults on Navi 4 in order to mitigate VM faults.
Based on my previous series (required for correct operation):

* Improve retry fault handling (v2)
* Improve soft IH ring

Solve a race condition between the VM update
performed by amdgpu_vm_handle_fault() and retry_cam_ack()
to make sure the ACK is always done after the VM update.

Adjust soft IH ring size on Navi 4. Note that Navi 4
seems to send the retry fault interrupts on the first
IH ring so they end up being dispatched on the soft
IH ring.

Adjust the PTE flags to make the VM update work correctly
on Navi 4. Without that, the update seems to be stuck in
a cache and can't resolve the fault.

Enable the retry CAM on Navi 4 as well in order to filter
the retry fault interrupts. Change the IH v7.0 code to
use the MMIO based ACK rather than a doorbell.
The doorbell seems to just not work at all on Navi 4
just like it also doesn't work Navi 3.

With this series, the kernel is able to mitigate VM faults
when amdgpu.noretry=0 is specified on the kernel command line.

Timur Kristóf (7):
  drm/amdgpu/vm: Add fence argument to amdgpu_vm_handle_fault()
  drm/amdgpu: ACK the retry CAM after VM update finishes
  drm/amdgpu/ih7.0: Use MMIO ACK instead of doorbell for retry CAM on IH
    7.0
  drm/amdgpu/ih7.0: Use IH_SW_RING_SIZE for soft IH ring instead of
    PAGE_SIZE
  drm/amdgpu/gmc12.0: Use AMDGPU_PTE_IS_PTE flag for init_pte_flags on
    GFX12.0
  drm/amdgpu/vm: Use init PTE flags, and NOALLOC in
    amdgpu_vm_handle_fault()
  drm/amdgpu/gmc12: Pass cam_index to retry fault handler

 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c     | 30 ++++++++++++++++++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h     |  8 ++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c      | 10 +++++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h      |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c      |  8 ++++--
 drivers/gpu/drm/amd/amdgpu/gmc_v12_1.c      |  4 +--
 drivers/gpu/drm/amd/amdgpu/ih_v7_0.c        | 25 +++--------------
 8 files changed, 57 insertions(+), 32 deletions(-)

-- 
2.53.0

Reply via email to