On 9/14/25 12:25 PM, Jérôme Lécuyer wrote:
Since 6.16.4, I am no longer able to use my dGPU.
It is visible in nvtop for a brief moment after the system boots,
but once it is D3cold, it can't wake up (not in nvtop anymore).
Specifications:
Laptop with
AMD Ryzen 5 4600H (iGPU)
AMD Radeon RX 5500M (dGPU), not overclocked (at least manually), goes to
D3cold often
~Arch Linux, KDE, Wayland, tried multiple kernels before and after 6.16.4.
Kernel versions:
dGPU works fine in 6.16.3 and before.
The issue started appearing in 6.16.4 and persists with 6.16.7 and 6.17-
rc5.
Bisect using aur/linux-git remote torvalds/linux found: https://
git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?
id=c97636cc83d4591c0c91b6f80eaca3434d7d3e3a
dmesg after starting nvtop:
[ 32.931442] [drm] PCIE GART of 512M enabled (table at
0x0000008000000000).
[ 32.931460] amdgpu 0000:03:00.0: amdgpu: PSP is resuming...
[ 33.086921] amdgpu 0000:03:00.0: amdgpu: reserve 0x900000 from
0x80fd000000 for PSP TMR
[ 33.130797] amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode
is not available
[ 33.136900] amdgpu 0000:03:00.0: amdgpu: RAP: optional rap ta ucode
is not available
[ 33.136903] amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: optional
securedisplay ta ucode is not available
[ 33.136907] amdgpu 0000:03:00.0: amdgpu: SMU is resuming...
[ 33.167904] amdgpu 0000:03:00.0: amdgpu: OverDrive is not enabled!
[ 33.167909] amdgpu 0000:03:00.0: amdgpu: resume of IP block <smu>
failed -22
[ 33.167912] amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume
failed (-22).
OverDrive is a warning. The two last logs are errors.
Building with this change on top of commit 22f20375f5b7 fixed the issue.
https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux/
+/22f20375f5b71f30c0d6896583b93b6e4bba7279
diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c b/drivers/gpu/
drm/amd/pm/swsmu/amdgpu_smu.c
index b47cb4a5f488..408f05dfab90 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -2236,7 +2236,7 @@ static int smu_resume(struct amdgpu_ip_block
*ip_block)
return ret;
}
- if (smu_dpm_ctx->dpm_level == AMD_DPM_FORCED_LEVEL_MANUAL) {
+ if (smu_dpm_ctx->dpm_level == AMD_DPM_FORCED_LEVEL_MANUAL &&
smu->od_enabled) {
ret = smu_od_edit_dpm_table(smu,
PP_OD_COMMIT_DPM_TABLE, NULL, 0);
if (ret)
return ret;
dGPU behaves normally now.
...
[ 275.490129] amdgpu 0000:03:00.0: amdgpu: SMU is resuming...
[ 275.521159] amdgpu 0000:03:00.0: amdgpu: SMU is resumed successfully!
[ 275.522179] amdgpu 0000:03:00.0: amdgpu: kiq ring mec 2 pipe 1 q 0
[ 275.525009] amdgpu 0000:03:00.0: [drm] Cannot find any crtc or sizes
[ 275.525023] amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 uses VM inv
eng 0 on hub 0
...
Thanks,
Jérôme
It makes sense. Can you send out a properly formatted patch to the M/L
with all the tags (Fixes/Closes/S-o-b)? Or if you want me to use yours
to write one and send one out (and give you a Suggested-by) I can do
that too.