There is a problem occurring on VCN 4.0.5 where in some situations a job is timing out. This triggers a job timeout which then causes a GPU reset for recovery. That has exposed a number of issues with GPU reset that have since been fixed. But also a GPU reset isn't actually needed for this circumstance. Just restarting the ring is enough.
Add a reset callback for the ring which will stop and start VCN if the issue happens. Link: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12528 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3909 Signed-off-by: Mario Limonciello <[email protected]> --- drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c index 558469744f3a..3e6e8127143b 100644 --- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c @@ -1440,6 +1440,24 @@ static void vcn_v4_0_5_unified_ring_set_wptr(struct amdgpu_ring *ring) } } +static int vcn_v4_0_5_ring_reset(struct amdgpu_ring *ring, unsigned int vmid) +{ + struct amdgpu_device *adev = ring->adev; + int i; + + for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { + struct amdgpu_vcn_inst *vinst = &adev->vcn.inst[i]; + + if (ring != &vinst->ring_enc[0]) + continue; + vcn_v4_0_5_stop(vinst); + vcn_v4_0_5_start(vinst); + break; + } + + return amdgpu_ring_test_helper(ring); +} + static struct amdgpu_ring_funcs vcn_v4_0_5_unified_ring_vm_funcs = { .type = AMDGPU_RING_TYPE_VCN_ENC, .align_mask = 0x3f, @@ -1467,6 +1485,7 @@ static struct amdgpu_ring_funcs vcn_v4_0_5_unified_ring_vm_funcs = { .emit_wreg = vcn_v2_0_enc_ring_emit_wreg, .emit_reg_wait = vcn_v2_0_enc_ring_emit_reg_wait, .emit_reg_write_reg_wait = amdgpu_ring_emit_reg_write_reg_wait_helper, + .reset = vcn_v4_0_5_ring_reset, }; /** -- 2.49.0
