On Mon, Oct 20, 2025 at 12:16 PM Rodrigo Siqueira <[email protected]> wrote:
>
> When trying to unload amdgpu in the SteamDeck (TTY mode), the following
> set of errors happens and the system gets unstable:
>
> [..]
>  [drm] Initialized amdgpu 3.64.0 for 0000:04:00.0 on minor 0
>  amdgpu 0000:04:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test 
> failed on gfx_0.0.0 (-110).
>  amdgpu 0000:04:00.0: amdgpu: ib ring test failed (-110).
> [..]
>  amdgpu 0000:04:00.0: amdgpu: SMU: I'm not done with your previous command: 
> SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000
>  amdgpu 0000:04:00.0: amdgpu: Failed to disable gfxoff!
>  amdgpu 0000:04:00.0: amdgpu: SMU: I'm not done with your previous command: 
> SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000
>  amdgpu 0000:04:00.0: amdgpu: Failed to disable gfxoff!
> [..]
>
> When the driver initializes the GPU, the PSP validates all the firmware
> loaded, and after that, it is not possible to load any other firmware
> unless the device is reset. What is happening in the load/unload
> situation is that PSP halts the GC engine because it suspects that
> something is amiss. To address this issue, this commit ensures that the
> GPU is reset (mode 2 reset) in the unload sequence.
>
> Suggested-by: Alex Deucher <[email protected]>
> Signed-off-by: Rodrigo Siqueira <[email protected]>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 0d5585bc3b04..0a7bcb2d5a50 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3649,6 +3649,13 @@ static int amdgpu_device_ip_fini_early(struct 
> amdgpu_device *adev)
>                                 "failed to release exclusive mode on fini\n");
>         }
>
> +       /* Reset the device before entirely removing it to avoid load issues
> +        * caused by firmware validation.
> +        */
> +
> +       if (r)
> +               dev_err(adev->dev, "asic reset on %s failed\n", __func__);
> +

I think this will break certain navi32 boards due to another quirk
they have. See

commit 7c1d9e10e6643121f1ffe9c0903467cc8682eba8
Author: Kenneth Feng <[email protected]>
Date:   Thu Mar 28 11:00:50 2024 +0800

    drm/amd/pm: fix the high voltage issue after unload

    fix the high voltage issue after unload on smu 13.0.10

    Signed-off-by: Kenneth Feng <[email protected]>
    Reviewed-by: Hawking Zhang <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>


It would probably be best to limit this to small APUs.  Something like:

if ((adev->flags & AMD_IS_APU) && !adev->gmc.is_app_apu)
        r = amdgpu_asic_reset(adev);

dGPUs should already be covered by the need_reset_on_init() logic so
there is no need to reset them.

Alex

>         return 0;
>  }
>
> --
> 2.51.0
>

Reply via email to