On Mon, Oct 20, 2025 at 12:16 PM Rodrigo Siqueira <[email protected]> wrote: > > When trying to unload amdgpu in the SteamDeck (TTY mode), the following > set of errors happens and the system gets unstable: > > [..] > [drm] Initialized amdgpu 3.64.0 for 0000:04:00.0 on minor 0 > amdgpu 0000:04:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test > failed on gfx_0.0.0 (-110). > amdgpu 0000:04:00.0: amdgpu: ib ring test failed (-110). > [..] > amdgpu 0000:04:00.0: amdgpu: SMU: I'm not done with your previous command: > SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000 > amdgpu 0000:04:00.0: amdgpu: Failed to disable gfxoff! > amdgpu 0000:04:00.0: amdgpu: SMU: I'm not done with your previous command: > SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000 > amdgpu 0000:04:00.0: amdgpu: Failed to disable gfxoff! > [..] > > When the driver initializes the GPU, the PSP validates all the firmware > loaded, and after that, it is not possible to load any other firmware > unless the device is reset. What is happening in the load/unload > situation is that PSP halts the GC engine because it suspects that > something is amiss. To address this issue, this commit ensures that the > GPU is reset (mode 2 reset) in the unload sequence. > > Suggested-by: Alex Deucher <[email protected]> > Signed-off-by: Rodrigo Siqueira <[email protected]> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index 0d5585bc3b04..0a7bcb2d5a50 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -3649,6 +3649,13 @@ static int amdgpu_device_ip_fini_early(struct > amdgpu_device *adev) > "failed to release exclusive mode on fini\n"); > } > > + /* Reset the device before entirely removing it to avoid load issues > + * caused by firmware validation. > + */ > + > + if (r) > + dev_err(adev->dev, "asic reset on %s failed\n", __func__); > +
I think this will break certain navi32 boards due to another quirk they have. See commit 7c1d9e10e6643121f1ffe9c0903467cc8682eba8 Author: Kenneth Feng <[email protected]> Date: Thu Mar 28 11:00:50 2024 +0800 drm/amd/pm: fix the high voltage issue after unload fix the high voltage issue after unload on smu 13.0.10 Signed-off-by: Kenneth Feng <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> It would probably be best to limit this to small APUs. Something like: if ((adev->flags & AMD_IS_APU) && !adev->gmc.is_app_apu) r = amdgpu_asic_reset(adev); dGPUs should already be covered by the need_reset_on_init() logic so there is no need to reset them. Alex > return 0; > } > > -- > 2.51.0 >
