[AMD Official Use Only - AMD Internal Distribution Only]

-----Original Message-----
From: Alex Deucher <[email protected]>
Sent: Wednesday, November 19, 2025 11:46 AM
To: Wang, Yang(Kevin) <[email protected]>
Cc: [email protected]; Zhang, Hawking <[email protected]>; 
Deucher, Alexander <[email protected]>; Li, Candice <[email protected]>
Subject: Re: [PATCH] drm/amd/pm: fix amdgpu_irq enabled counter unbalanced on 
smu v11.0

On Tue, Nov 18, 2025 at 9:59 PM Yang Wang <[email protected]> wrote:
>
> fix amdgpu_irq enabled counter unbalanced issue on 
> smu_v11_0_disable_thermal_alert.
>
> [  357.773144] ------------[ cut here ]------------ [  357.773156]
> WARNING: CPU: 21 PID: 2202 at
> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:639 amdgpu_irq_put+0xd8/0xf0 [amdgpu] 
> ...
> [  357.774651] Tainted: [E]=UNSIGNED_MODULE [  357.774656] Hardware
> name: GIGABYTE MZ01-CE0-00/MZ01-CE0-00, BIOS F14a 08/14/2020 [
> 357.774664] RIP: 0010:amdgpu_irq_put+0xd8/0xf0 [amdgpu] [  357.775563]
> Code: 31 f6 31 ff e9 f9 c3 4f cb 44 89 f2 4c 89 e6 4c 89 ef e8 db fc
> ff ff 5b 41 5c 41 5d 41 5e 5d 31 d2 31 f6 31 ff e9 d8 c3 4f cb <0f> 0b
> eb c3 b8 fe ff ff ff eb 97 e9 d3 8d 8b 00 0f 1f 84 00 00 00 [
> 357.775573] RSP: 0018:ffffd28616ecba58 EFLAGS: 00010246 [  357.775584]
> RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 [
> 357.775592] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> 0000000000000000 [  357.775598] RBP: ffffd28616ecba78 R08:
> 0000000000000000 R09: 0000000000000000 [  357.775605] R10:
> 0000000000000000 R11: 0000000000000000 R12: ffff8aac201a8008 [
> 357.775611] R13: ffff8aac0e600000 R14: 0000000000000000 R15:
> ffff8aac201a8000 [  357.775618] FS:  0000751c697b7c40(0000) 
> GS:ffff8acb4fba2000(0000) knlGS:0000000000000000 [  357.775627] CS:  0010 DS: 
> 0000 ES: 0000 CR0: 0000000080050033 [  357.775634] CR2: 00005a844a5e7028 CR3: 
> 0000001039a0f000 CR4: 00000000003506f0 [  357.775642] Call Trace:
> [  357.775649]  <TASK>
> [  357.775663]  smu_v11_0_disable_thermal_alert+0x17/0x30 [amdgpu] [
> 357.776704]  smu_smc_hw_cleanup+0x79/0x500 [amdgpu] [  357.777857]
> smu_hw_fini+0x139/0x200 [amdgpu] [  357.778908]
> amdgpu_ip_block_hw_fini+0x29/0xc0 [amdgpu] [  357.779698]
> amdgpu_device_fini_hw+0x2e5/0x560 [amdgpu] [  357.780487]  ?
> blocking_notifier_chain_unregister+0x3e/0x70
> [  357.780511]  amdgpu_driver_unload_kms+0x4b/0x70 [amdgpu] [
> 357.781334]  amdgpu_pci_remove+0x50/0x90 [amdgpu] [  357.782126]
> pci_device_remove+0x41/0xc0 [  357.782145]  device_remove+0x46/0x80 [
> 357.782159]  device_release_driver_internal+0x203/0x270
> [  357.782169]  ? srso_return_thunk+0x5/0x5f [  357.782189]
> driver_detach+0x4a/0xa0 [  357.782201]  bus_remove_driver+0x83/0x110 [
> 357.782216]  driver_unregister+0x31/0x60 [  357.782227]
> pci_unregister_driver+0x40/0x90 [  357.782244]  amdgpu_exit+0x15/0x3b
> [amdgpu]
>
> Signed-off-by: Yang Wang <[email protected]>
> ---
>  drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
> b/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
> index 78e4186d06cc..24d9f576846b 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
> @@ -1022,7 +1022,12 @@ int smu_v11_0_enable_thermal_alert(struct
> smu_context *smu)
>
>  int smu_v11_0_disable_thermal_alert(struct smu_context *smu)  {
> -       return amdgpu_irq_put(smu->adev, &smu->irq_source, 0);
> +       int ret = 0;
> +
> +       if (smu->smu_table.thermal_controller_type)
> +               ret = amdgpu_irq_get(smu->adev, &smu->irq_source, 0);

Shouldn't this be amdgpu_irq_put()?  With that fixed,
Reviewed-by: Alex Deucher <[email protected]>

[kevin]:

Yes, thanks, my mistake, forget to sync local changes to patch file before 
sending out review, will fix it before submitting.

Best Regards,
Kevin

> +
> +       return ret;
>  }
>
>  static uint16_t convert_to_vddc(uint8_t vid)
> --
> 2.34.1
>

Reply via email to