On 11/4/25 6:32 PM, Rodrigo Siqueira wrote:
On 10/20, Rodrigo Siqueira wrote:
When trying to unload amdgpu in the SteamDeck (TTY mode), the following
set of errors happens and the system gets unstable:
[..]
[drm] Initialized amdgpu 3.64.0 for 0000:04:00.0 on minor 0
amdgpu 0000:04:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test
failed on gfx_0.0.0 (-110).
amdgpu 0000:04:00.0: amdgpu: ib ring test failed (-110).
[..]
amdgpu 0000:04:00.0: amdgpu: SMU: I'm not done with your previous command:
SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000
amdgpu 0000:04:00.0: amdgpu: Failed to disable gfxoff!
amdgpu 0000:04:00.0: amdgpu: SMU: I'm not done with your previous command:
SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000
amdgpu 0000:04:00.0: amdgpu: Failed to disable gfxoff!
[..]
When the driver initializes the GPU, the PSP validates all the firmware
loaded, and after that, it is not possible to load any other firmware
unless the device is reset. What is happening in the load/unload
situation is that PSP halts the GC engine because it suspects that
something is amiss. To address this issue, this commit ensures that the
GPU is reset (mode 2 reset) in the unload sequence.
Suggested-by: Alex Deucher <[email protected]>
Signed-off-by: Rodrigo Siqueira <[email protected]>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 0d5585bc3b04..0a7bcb2d5a50 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3649,6 +3649,13 @@ static int amdgpu_device_ip_fini_early(struct
amdgpu_device *adev)
"failed to release exclusive mode on fini\n");
}
+ /* Reset the device before entirely removing it to avoid load issues
+ * caused by firmware validation.
+ */
+ r = amdgpu_asic_reset(adev);
+ if (r)
+ dev_err(adev->dev, "asic reset on %s failed\n", __func__);
+
return 0;
}
--
2.51.0
Hi,
I just want to follow-up about this patch. Do I need to make any other
modification?
Thanks
I was a little bit worried about implications for the 6.19 changes
around shutdown, but I talked to siqueira about some testing for it and
he did a wide array of testing across different GPUs with load/unload
and shutdown. To me this looks fine, but I think Alex should give his
thoughts too.
Reviewed-by: Mario Limonciello (AMD) <[email protected]>