On 9/14/25 12:25 PM, Jérôme Lécuyer wrote:
Since 6.16.4, I am no longer able to use my dGPU.

It is visible in nvtop for a brief moment after the system boots,
but once it is D3cold, it can't wake up (not in nvtop anymore).

Specifications:
Laptop with
AMD Ryzen 5 4600H (iGPU)
AMD Radeon RX 5500M (dGPU), not overclocked (at least manually), goes to D3cold often
~Arch Linux, KDE, Wayland, tried multiple kernels before and after 6.16.4.

Kernel versions:
dGPU works fine in 6.16.3 and before.
The issue started appearing in 6.16.4 and persists with 6.16.7 and 6.17- rc5. Bisect using aur/linux-git remote torvalds/linux found: https:// git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/? id=c97636cc83d4591c0c91b6f80eaca3434d7d3e3a

dmesg after starting nvtop:

[   32.931442] [drm] PCIE GART of 512M enabled (table at 0x0000008000000000).
[   32.931460] amdgpu 0000:03:00.0: amdgpu: PSP is resuming...
[   33.086921] amdgpu 0000:03:00.0: amdgpu: reserve 0x900000 from 0x80fd000000 for PSP TMR [   33.130797] amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available [   33.136900] amdgpu 0000:03:00.0: amdgpu: RAP: optional rap ta ucode is not available [   33.136903] amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: optional securedisplay ta ucode is not available
[   33.136907] amdgpu 0000:03:00.0: amdgpu: SMU is resuming...
[   33.167904] amdgpu 0000:03:00.0: amdgpu: OverDrive is not enabled!
[   33.167909] amdgpu 0000:03:00.0: amdgpu: resume of IP block <smu> failed -22 [   33.167912] amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume failed (-22).

OverDrive is a warning. The two last logs are errors.


Building with this change on top of commit 22f20375f5b7 fixed the issue.
https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux/ +/22f20375f5b71f30c0d6896583b93b6e4bba7279

diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c b/drivers/gpu/ drm/amd/pm/swsmu/amdgpu_smu.c
index b47cb4a5f488..408f05dfab90 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -2236,7 +2236,7 @@ static int smu_resume(struct amdgpu_ip_block *ip_block)
                         return ret;
         }

-       if (smu_dpm_ctx->dpm_level == AMD_DPM_FORCED_LEVEL_MANUAL) {
+       if (smu_dpm_ctx->dpm_level == AMD_DPM_FORCED_LEVEL_MANUAL && smu->od_enabled) {                 ret = smu_od_edit_dpm_table(smu, PP_OD_COMMIT_DPM_TABLE, NULL, 0);
                 if (ret)
                         return ret;


dGPU behaves normally now.

...
[  275.490129] amdgpu 0000:03:00.0: amdgpu: SMU is resuming...
[  275.521159] amdgpu 0000:03:00.0: amdgpu: SMU is resumed successfully!
[  275.522179] amdgpu 0000:03:00.0: amdgpu: kiq ring mec 2 pipe 1 q 0
[  275.525009] amdgpu 0000:03:00.0: [drm] Cannot find any crtc or sizes
[  275.525023] amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
...


Thanks,
Jérôme


It makes sense. Can you send out a properly formatted patch to the M/L with all the tags (Fixes/Closes/S-o-b)? Or if you want me to use yours to write one and send one out (and give you a Suggested-by) I can do that too.

Reply via email to