Applied.  Thanks!

Alex

On Wed, Jun 17, 2026 at 3:54 AM Jakob Linke <[email protected]> wrote:
>
> For SOC24 ASICs (RDNA4 / Navi 4x dGPUs) re-enabling PM features fails if an
> S3 suspend got aborted, the same issue already handled for SOC21 and SOC15:
>
>   commit df3c7dc5c58b ("drm/amdgpu: Reset dGPU if suspend got aborted")
>   commit 38e8ca3e4b6d ("amdgpu/soc15: enable asic reset for dGPU in case of 
> suspend abort")
>
> The aborted resume fails with:
>
>   amdgpu: SMU: No response msg_reg: 6 resp_reg: 0
>   amdgpu: Failed to enable requested dpm features!
>   amdgpu: resume of IP block <smu> failed -62
>
> Apply the same workaround for soc24: detect the aborted-suspend state at
> resume via the sign-of-life register and reset the device before re-init.
>
> This is a workaround till a proper solution is finalized.
>
> Fixes: 98b912c50e44 ("drm/amdgpu: Add soc24 common ip block (v2)")
> Cc: [email protected]
> Signed-off-by: Jakob Linke <[email protected]>
> ---
> Tested on Navi 44 (RX 9060 XT): recovers the deep->s2idle fallback and pure
> s2idle resumes that otherwise fail with "resume of IP block <smu> failed -62".
> It did not recover every case: one resume still failed under sustained rapid
> s2idle cycling, so like the SOC21/SOC15 versions this is a mitigation, not a
> complete fix. Single suspends in normal use recover.
>
>  drivers/gpu/drm/amd/amdgpu/soc24.c | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc24.c 
> b/drivers/gpu/drm/amd/amdgpu/soc24.c
> index ecb6c3fcfbd1..a970d8a76302 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc24.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc24.c
> @@ -521,8 +521,36 @@ static int soc24_common_suspend(struct amdgpu_ip_block 
> *ip_block)
>         return soc24_common_hw_fini(ip_block);
>  }
>
> +static bool soc24_need_reset_on_resume(struct amdgpu_device *adev)
> +{
> +       u32 sol_reg1, sol_reg2;
> +
> +       /* Will reset for the following suspend abort cases.
> +        * 1) Only reset dGPU side.
> +        * 2) S3 suspend got aborted and TOS is active.
> +        *    As for dGPU suspend abort cases the SOL value
> +        *    will be kept as zero at this resume point.
> +        */
> +       if (!(adev->flags & AMD_IS_APU) && adev->in_s3) {
> +               sol_reg1 = RREG32_SOC15(MP0, 0, regMPASP_SMN_C2PMSG_81);
> +               msleep(100);
> +               sol_reg2 = RREG32_SOC15(MP0, 0, regMPASP_SMN_C2PMSG_81);
> +
> +               return (sol_reg1 != sol_reg2);
> +       }
> +
> +       return false;
> +}
> +
>  static int soc24_common_resume(struct amdgpu_ip_block *ip_block)
>  {
> +       struct amdgpu_device *adev = ip_block->adev;
> +
> +       if (soc24_need_reset_on_resume(adev)) {
> +               dev_info(adev->dev, "S3 suspend aborted, resetting...");
> +               soc24_asic_reset(adev);
> +       }
> +
>         return soc24_common_hw_init(ip_block);
>  }
>
> --
> 2.54.0
>

Reply via email to