On Tue, Nov 18, 2025 at 9:59 PM Yang Wang <[email protected]> wrote: > > fix amdgpu_irq enabled counter unbalanced issue on > smu_v11_0_disable_thermal_alert. > > [ 357.773144] ------------[ cut here ]------------ > [ 357.773156] WARNING: CPU: 21 PID: 2202 at > drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:639 amdgpu_irq_put+0xd8/0xf0 [amdgpu] > ... > [ 357.774651] Tainted: [E]=UNSIGNED_MODULE > [ 357.774656] Hardware name: GIGABYTE MZ01-CE0-00/MZ01-CE0-00, BIOS F14a > 08/14/2020 > [ 357.774664] RIP: 0010:amdgpu_irq_put+0xd8/0xf0 [amdgpu] > [ 357.775563] Code: 31 f6 31 ff e9 f9 c3 4f cb 44 89 f2 4c 89 e6 4c 89 ef e8 > db fc ff ff 5b 41 5c 41 5d 41 5e 5d 31 d2 31 f6 31 ff e9 d8 c3 4f cb <0f> 0b > eb c3 b8 fe ff ff ff eb 97 e9 d3 8d 8b 00 0f 1f 84 00 00 00 > [ 357.775573] RSP: 0018:ffffd28616ecba58 EFLAGS: 00010246 > [ 357.775584] RAX: 0000000000000000 RBX: 0000000000000001 RCX: > 0000000000000000 > [ 357.775592] RDX: 0000000000000000 RSI: 0000000000000000 RDI: > 0000000000000000 > [ 357.775598] RBP: ffffd28616ecba78 R08: 0000000000000000 R09: > 0000000000000000 > [ 357.775605] R10: 0000000000000000 R11: 0000000000000000 R12: > ffff8aac201a8008 > [ 357.775611] R13: ffff8aac0e600000 R14: 0000000000000000 R15: > ffff8aac201a8000 > [ 357.775618] FS: 0000751c697b7c40(0000) GS:ffff8acb4fba2000(0000) > knlGS:0000000000000000 > [ 357.775627] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 357.775634] CR2: 00005a844a5e7028 CR3: 0000001039a0f000 CR4: > 00000000003506f0 > [ 357.775642] Call Trace: > [ 357.775649] <TASK> > [ 357.775663] smu_v11_0_disable_thermal_alert+0x17/0x30 [amdgpu] > [ 357.776704] smu_smc_hw_cleanup+0x79/0x500 [amdgpu] > [ 357.777857] smu_hw_fini+0x139/0x200 [amdgpu] > [ 357.778908] amdgpu_ip_block_hw_fini+0x29/0xc0 [amdgpu] > [ 357.779698] amdgpu_device_fini_hw+0x2e5/0x560 [amdgpu] > [ 357.780487] ? blocking_notifier_chain_unregister+0x3e/0x70 > [ 357.780511] amdgpu_driver_unload_kms+0x4b/0x70 [amdgpu] > [ 357.781334] amdgpu_pci_remove+0x50/0x90 [amdgpu] > [ 357.782126] pci_device_remove+0x41/0xc0 > [ 357.782145] device_remove+0x46/0x80 > [ 357.782159] device_release_driver_internal+0x203/0x270 > [ 357.782169] ? srso_return_thunk+0x5/0x5f > [ 357.782189] driver_detach+0x4a/0xa0 > [ 357.782201] bus_remove_driver+0x83/0x110 > [ 357.782216] driver_unregister+0x31/0x60 > [ 357.782227] pci_unregister_driver+0x40/0x90 > [ 357.782244] amdgpu_exit+0x15/0x3b [amdgpu] > > Signed-off-by: Yang Wang <[email protected]> > --- > drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c > b/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c > index 78e4186d06cc..24d9f576846b 100644 > --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c > +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c > @@ -1022,7 +1022,12 @@ int smu_v11_0_enable_thermal_alert(struct smu_context > *smu) > > int smu_v11_0_disable_thermal_alert(struct smu_context *smu) > { > - return amdgpu_irq_put(smu->adev, &smu->irq_source, 0); > + int ret = 0; > + > + if (smu->smu_table.thermal_controller_type) > + ret = amdgpu_irq_get(smu->adev, &smu->irq_source, 0);
Shouldn't this be amdgpu_irq_put()? With that fixed, Reviewed-by: Alex Deucher <[email protected]> > + > + return ret; > } > > static uint16_t convert_to_vddc(uint8_t vid) > -- > 2.34.1 >
