amdgpu: Add ring reset support for VCN v5.0.1

Lijo Lazar Thu, 21 Aug 2025 14:39:12 -0700

On 8/20/2025 8:33 AM, Jesse.Zhang wrote:
> Implement the ring reset callback for VCN v5.0.1 to properly handle
> hardware recovery when encountering GPU hangs. The new functionality:
> 
> 1. Adds vcn_v5_0_1_ring_reset() function that:
>    - Prepares for reset using amdgpu_ring_reset_helper_begin()
>    - Performs VCN instance reset via amdgpu_dpm_reset_vcn()
>    - Re-initializes hardware through vcn_v5_0_1_hw_init_inst()
>    - Restarts DPG mode with vcn_v5_0_1_start_dpg_mode()
>    - Completes reset with amdgpu_ring_reset_helper_end()
> 
> 2. Hooks the reset function into the unified ring functions via:
>    - Adding .reset = vcn_v5_0_1_ring_reset to vcn_v5_0_1_unified_ring_vm_funcs
> 
> 3. Maintains existing behavior for SR-IOV VF cases by checking RRMT status
> 
> This provides proper hardware recovery capabilities for VCN 5.0.1 IP block
> during fault conditions, matching functionality available in other VCN 
> versions.
> 
> Signed-off-by: Jesse Zhang <jesse.zh...@amd.com>
> Signed-off-by: Ruili Ji <ruili...@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c | 29 +++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c 
> b/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c
> index 1b5d44fa2b57..779043eac827 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c
> @@ -1284,6 +1284,34 @@ static void vcn_v5_0_1_unified_ring_set_wptr(struct 
> amdgpu_ring *ring)
>       }
>  }
>  
> +static int vcn_v5_0_1_ring_reset(struct amdgpu_ring *ring,
> +                              unsigned int vmid,
> +                              struct amdgpu_fence *timedout_fence)
> +{
> +     int r = 0;
> +     int vcn_inst;
> +     struct amdgpu_device *adev = ring->adev;
> +     struct amdgpu_vcn_inst *vinst = &adev->vcn.inst[ring->me];
> +
> +     amdgpu_ring_reset_helper_begin(ring, timedout_fence);
> +
> +     vcn_inst = GET_INST(VCN, ring->me);
> +     r = amdgpu_dpm_reset_vcn(adev, 1 << vcn_inst);
> +
> +     if (r) {
> +             DRM_DEV_ERROR(adev->dev, "VCN reset fail : %d\n", r);
> +             return r;
> +     }
> +
> +     /* This flag is not set for VF, assumed to be disabled always */
> +     if (RREG32_SOC15(VCN, GET_INST(VCN, 0), regVCN_RRMT_CNTL) & 0x100)
> +             adev->vcn.caps |= AMDGPU_VCN_CAPS(RRMT_ENABLED);

This is not required. The assumption is settings is common across all
instances, hence only the first instance's setting is taken. So if vcn
instance 2 or 3 is reset, this doesn't matter.

> +     vcn_v5_0_1_hw_init_inst(adev, ring->me);
> +     vcn_v5_0_1_start_dpg_mode(vinst, 
> adev->vcn.inst[ring->me].indirect_sram);

You could use vinst->indirect_sram. That said, it seems there is no need
to pass this as an extra parameter.

Thanks,
Lijo
> +
> +     return amdgpu_ring_reset_helper_end(ring, timedout_fence);
> +}
> +
>  static const struct amdgpu_ring_funcs vcn_v5_0_1_unified_ring_vm_funcs = {
>       .type = AMDGPU_RING_TYPE_VCN_ENC,
>       .align_mask = 0x3f,
> @@ -1312,6 +1340,7 @@ static const struct amdgpu_ring_funcs 
> vcn_v5_0_1_unified_ring_vm_funcs = {
>       .emit_wreg = vcn_v4_0_3_enc_ring_emit_wreg,
>       .emit_reg_wait = vcn_v4_0_3_enc_ring_emit_reg_wait,
>       .emit_reg_write_reg_wait = amdgpu_ring_emit_reg_write_reg_wait_helper,
> +     .reset = vcn_v5_0_1_ring_reset,
>  };
>  
>  /**
Re: [v3 2/5] drm/amdgpu: Add ring reset support for VCN v5.0.1

Reply via email to