[PATCH] drm/amdgpu: Fix KIQ fence timeout after GPU reset on GFX v9.4.3

Chenglei Xie Tue, 03 Mar 2026 08:19:37 -0800

After GPU reset, the hardware queue is cleared and all pending fences
are lost. However, the fence writeback memory remains stale from before
reset, while software continues emitting fences and sync_seq keeps
incrementing. This causes amdgpu_fence_emit_polling() to wait for
fences that were lost during reset, resulting in -ETIMEDOUT errors.


Fix this by updating the fence writeback memory to match sync_seq after
GPU reset in gfx_v9_4_3_xcc_kiq_init_queue(). This aligns the hardware's
view of completed fences with software's view of emitted fences,
preventing timeouts when waiting for fences that no longer exist.

Signed-off-by: Chenglei Xie <[email protected]>
Change-Id: I717df52ed0ef0bb51a6901f218191d9837a77f6f
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
index ad4d442e7345e..6b5fcdd987693 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
@@ -2135,6 +2135,15 @@ static int gfx_v9_4_3_xcc_kiq_init_queue(struct 
amdgpu_ring *ring, int xcc_id)
                gfx_v9_4_3_xcc_kiq_init_register(ring, xcc_id);
                soc15_grbm_select(adev, 0, 0, 0, 0, GET_INST(GC, xcc_id));
                mutex_unlock(&adev->srbm_mutex);
+
+               /* Update fence writeback memory to align with software state 
after reset.
+                * After GPU reset, the hardware queue is cleared and all 
pending fences
+                * are lost. The fence writeback memory may be stale from 
before reset. To prevent
+                * waiting for lost fences, update writeback memory to match 
sync_seq.
+                * This avoids waiting for lost fences and prevents timeouts.
+                */
+                if (ring->fence_drv.cpu_addr)
+                       *ring->fence_drv.cpu_addr = 
cpu_to_le32(ring->fence_drv.sync_seq);
        } else {
                memset((void *)mqd, 0, sizeof(struct v9_mqd_allocation));
                ((struct v9_mqd_allocation *)mqd)->dynamic_cu_mask = 0xFFFFFFFF;
-- 
2.34.1

[PATCH] drm/amdgpu: Fix KIQ fence timeout after GPU reset on GFX v9.4.3

Reply via email to