On old GPUs, it may be an issue that handling the interrupts from VM faults is too slow and the interrupt handler (IH) ring may overflow, which can cause an eventual hang.
Delegate the processing of all VM faults to the soft IRQ handler ring. As a result, we spend much less time in the IRQ handler that interacts with the HW IH ring, which significantly reduces the chance of hangs/reboots. Signed-off-by: Timur Kristóf <[email protected]> --- drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c index e1509480dfc2..6551b60f2584 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c @@ -1439,6 +1439,12 @@ static int gmc_v8_0_process_interrupt(struct amdgpu_device *adev, return 0; } + /* Delegate to the soft IRQ handler ring */ + if (adev->irq.ih_soft.enabled && entry->ih != &adev->irq.ih_soft) { + amdgpu_irq_delegate(adev, entry, 4); + return 1; + } + addr = RREG32(mmVM_CONTEXT1_PROTECTION_FAULT_ADDR); status = RREG32(mmVM_CONTEXT1_PROTECTION_FAULT_STATUS); mc_client = RREG32(mmVM_CONTEXT1_PROTECTION_FAULT_MCCLIENT); -- 2.51.1
