Applied. Thanks! Alex
On Wed, Jun 17, 2026 at 3:54 AM Jakob Linke <[email protected]> wrote: > > For SOC24 ASICs (RDNA4 / Navi 4x dGPUs) re-enabling PM features fails if an > S3 suspend got aborted, the same issue already handled for SOC21 and SOC15: > > commit df3c7dc5c58b ("drm/amdgpu: Reset dGPU if suspend got aborted") > commit 38e8ca3e4b6d ("amdgpu/soc15: enable asic reset for dGPU in case of > suspend abort") > > The aborted resume fails with: > > amdgpu: SMU: No response msg_reg: 6 resp_reg: 0 > amdgpu: Failed to enable requested dpm features! > amdgpu: resume of IP block <smu> failed -62 > > Apply the same workaround for soc24: detect the aborted-suspend state at > resume via the sign-of-life register and reset the device before re-init. > > This is a workaround till a proper solution is finalized. > > Fixes: 98b912c50e44 ("drm/amdgpu: Add soc24 common ip block (v2)") > Cc: [email protected] > Signed-off-by: Jakob Linke <[email protected]> > --- > Tested on Navi 44 (RX 9060 XT): recovers the deep->s2idle fallback and pure > s2idle resumes that otherwise fail with "resume of IP block <smu> failed -62". > It did not recover every case: one resume still failed under sustained rapid > s2idle cycling, so like the SOC21/SOC15 versions this is a mitigation, not a > complete fix. Single suspends in normal use recover. > > drivers/gpu/drm/amd/amdgpu/soc24.c | 28 ++++++++++++++++++++++++++++ > 1 file changed, 28 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/soc24.c > b/drivers/gpu/drm/amd/amdgpu/soc24.c > index ecb6c3fcfbd1..a970d8a76302 100644 > --- a/drivers/gpu/drm/amd/amdgpu/soc24.c > +++ b/drivers/gpu/drm/amd/amdgpu/soc24.c > @@ -521,8 +521,36 @@ static int soc24_common_suspend(struct amdgpu_ip_block > *ip_block) > return soc24_common_hw_fini(ip_block); > } > > +static bool soc24_need_reset_on_resume(struct amdgpu_device *adev) > +{ > + u32 sol_reg1, sol_reg2; > + > + /* Will reset for the following suspend abort cases. > + * 1) Only reset dGPU side. > + * 2) S3 suspend got aborted and TOS is active. > + * As for dGPU suspend abort cases the SOL value > + * will be kept as zero at this resume point. > + */ > + if (!(adev->flags & AMD_IS_APU) && adev->in_s3) { > + sol_reg1 = RREG32_SOC15(MP0, 0, regMPASP_SMN_C2PMSG_81); > + msleep(100); > + sol_reg2 = RREG32_SOC15(MP0, 0, regMPASP_SMN_C2PMSG_81); > + > + return (sol_reg1 != sol_reg2); > + } > + > + return false; > +} > + > static int soc24_common_resume(struct amdgpu_ip_block *ip_block) > { > + struct amdgpu_device *adev = ip_block->adev; > + > + if (soc24_need_reset_on_resume(adev)) { > + dev_info(adev->dev, "S3 suspend aborted, resetting..."); > + soc24_asic_reset(adev); > + } > + > return soc24_common_hw_init(ip_block); > } > > -- > 2.54.0 >
