On 08/06/2026 15:50, Thadeu Lima de Souza Cascardo wrote:
kfd_init_node/kfd_resume will end up calling init_mqd, which uses the
profiler_lock mutex before it is initialized, resulting in the warning
below when CONFIG_DEBUG_MUTEXES=y.
Moving the initialization of profiler_lock earlier in kgd2kfd_device_init
fixes the issue.
Beat you to it, sorry! ;)
cd0e76a2f60e ("amd/amdkfd: Fix profiler lock init order")
Regards,
Tvrtko
[ 13.121334] kfd kfd: Allocated 3969056 bytes on gart
[ 13.121439] kfd kfd: Total number of KFD nodes to be created: 1
[ 13.122509] ------------[ cut here ]------------
[ 13.122523] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
[ 13.122524] WARNING: kernel/locking/mutex.c:625 at
__mutex_lock+0x623/0x1160, CPU#2: (udev-worker)/598
[ 13.122544] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6
nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 joydev
snd_soc_acp5x_mach algif_hash algif_skcipher snd_acp5x_pcm_dma snd_acp5x_i2s
af_alg mousedev snd_sof_amd_acp70 ramoops nf_tables reed_solomon bnep
snd_sof_amd_acp63 hid_multitouch intel_rapl_msr amdgpu(+) intel_rapl_common
snd_sof_amd_vangogh snd_sof_amd_acp snd_sof_pci btusb btrtl snd_sof
rtw88_8822ce btintel snd_sof_utils i2c_algo_bit rtw88_8822c kvm_amd btbcm
snd_sof_xtensa_dsp rtw88_pci drm_buddy btmtk hid_steam drm_ttm_helper
snd_pci_ps snd_hda_codec_atihdmi rtw88_core ttm snd_soc_acpi_amd_match
snd_hda_codec_hdmi mac80211 kvm snd_soc_acpi_amd_sdca_quirks ff_memless
bluetooth snd_hda_intel libarc4 drm_exec cdc_acm snd_soc_sdca ecdh_generic
snd_hda_codec sp5100_tco irqbypass snd_soc_cs35l41_spi snd_acp_pci
drm_suballoc_helper aesni_intel snd_soc_cs35l41 snd_soc_cs35l4
1_lib drm_panel_backlight_quirks
[ 13.122617] snd_amd_acpi_mach gf128mul atkbd snd_acp_legacy_common
snd_hwdep snd_soc_nau8821 snd_soc_wm_adsp gpu_sched snd_hda_core rapl
snd_pci_acp6x cfg80211 libps2 snd_soc_core i2c_piix4 snd_intel_dspcfg amdxcp
video vivaldi_fmap snd_compress pcspkr wdat_wdt opt3001 ltrf216a wmi i2c_smbus
rfkill cs_dsp drm_display_helper snd_pcm i2c_hid_acpi snd_timer industrialio
snd_pci_acp5x i2c_hid snd snd_acp_config cec soundcore snd_soc_acpi 8250_dw ccp
mac_hid pkcs8_key_parser crypto_user loop fuse dm_mod nfnetlink zram
842_decompress lz4hc_compress 842_compress overlay ext4 crc16 mbcache jbd2
usbhid vfat fat btrfs xor libblake2b raid6_pq sdhci_pci sdhci_uhs2 serio_raw
sdhci xhci_pci cqhci nvme xhci_hcd mmc_core nvme_core i8042 serio spi_amd
[ 13.122778] CPU: 2 UID: 0 PID: 598 Comm: (udev-worker) Not tainted
7.1.0-rc5-g17cdb54644e7 #95 PREEMPT fe7e422e25ce48c0eeff34bf50e2cbbb74b08f52
[ 13.122792] Hardware name: Valve Jupiter/Jupiter, BIOS F7A0133 08/05/2024
[ 13.122799] RIP: 0010:__mutex_lock+0x62a/0x1160
[ 13.122807] Code: ff e8 ba 2c 87 ff 85 c0 0f 84 95 fa ff ff 8b 05 fc d1 b5 00 85
c0 0f 85 87 fa ff ff 48 8d 3d 5d ae b6 00 48 c7 c6 85 eb 63 a9 <67> 48 0f b9 3a
e9 6f fa ff ff 48 8b 7d 80 e8 f3 8a 00 00 41 f7 c5
[ 13.122823] RSP: 0018:ffffcdbdc2567560 EFLAGS: 00010246
[ 13.122830] RAX: 0000000000000000 RBX: ffff8c055f1486d8 RCX: 0000000000000000
[ 13.122837] RDX: 0000000000000001 RSI: ffffffffa963eb85 RDI: ffffffffa9a727f0
[ 13.122843] RBP: ffffcdbdc2567610 R08: ffffffffc1a7c8f0 R09: 0000000000000000
[ 13.122850] R10: ffffcdbdc2567628 R11: 0000000000000002 R12: 0000000000000000
[ 13.122857] R13: 0000000000000002 R14: ffff8c0567139200 R15: 0000000000000000
[ 13.122863] FS: 00007f9e0ffdf8c0(0000) GS:ffff8c08c4cf7000(0000)
knlGS:0000000000000000
[ 13.122872] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 13.122877] CR2: 00007f209b4917c8 CR3: 0000000109705000 CR4: 0000000000350ef0
[ 13.122885] Call Trace:
[ 13.122889] <TASK>
[ 13.122892] ? mark_held_locks+0x40/0x70
[ 13.122902] ? init_mqd+0x140/0x1b0 [amdgpu
124ada0c0ee626a38601e4af30deafe4f3d26a19]
[ 13.123412] ? lockdep_hardirqs_on+0x78/0x100
[ 13.123424] ? init_mqd+0x140/0x1b0 [amdgpu
124ada0c0ee626a38601e4af30deafe4f3d26a19]
[ 13.123896] init_mqd+0x140/0x1b0 [amdgpu
124ada0c0ee626a38601e4af30deafe4f3d26a19]
[ 13.124236] init_mqd_hiq+0x12/0x30 [amdgpu
124ada0c0ee626a38601e4af30deafe4f3d26a19]
[ 13.124570] kq_initialize.constprop.0+0x2f3/0x3a0 [amdgpu
124ada0c0ee626a38601e4af30deafe4f3d26a19]
[ 13.124908] kernel_queue_init+0x44/0x60 [amdgpu
124ada0c0ee626a38601e4af30deafe4f3d26a19]
[ 13.125268] pm_init+0x70/0x100 [amdgpu
124ada0c0ee626a38601e4af30deafe4f3d26a19]
[ 13.125653] start_cpsch+0x1d7/0x270 [amdgpu
124ada0c0ee626a38601e4af30deafe4f3d26a19]
[ 13.125994] kgd2kfd_device_init.cold+0x7a7/0xa02 [amdgpu
124ada0c0ee626a38601e4af30deafe4f3d26a19]
[ 13.126387] amdgpu_amdkfd_device_init+0x193/0x260 [amdgpu
124ada0c0ee626a38601e4af30deafe4f3d26a19]
[ 13.126727] amdgpu_device_init.cold+0x18c7/0x1d94 [amdgpu
124ada0c0ee626a38601e4af30deafe4f3d26a19]
[ 13.127118] amdgpu_driver_load_kms+0x19/0x80 [amdgpu
124ada0c0ee626a38601e4af30deafe4f3d26a19]
[ 13.127412] amdgpu_pci_probe+0x204/0x440 [amdgpu
124ada0c0ee626a38601e4af30deafe4f3d26a19]
[ 13.127698] local_pci_probe+0x3c/0x80
[ 13.127706] pci_call_probe+0x55/0x2e0
[ 13.127712] ? _raw_spin_unlock+0x2d/0x50
[ 13.127717] ? pci_match_device+0x157/0x180
[ 13.127722] pci_device_probe+0x9b/0x170
[ 13.127727] really_probe+0xd5/0x370
[ 13.127733] ? __device_attach_driver+0x120/0x120
[ 13.127738] __driver_probe_device+0x84/0x150
[ 13.127742] driver_probe_device+0x1f/0xa0
[ 13.127747] __driver_attach+0xb3/0x1e0
[ 13.127752] bus_for_each_dev+0x8e/0xe0
[ 13.127757] bus_add_driver+0x11e/0x200
[ 13.127762] driver_register+0x72/0xc0
[ 13.127768] ? nft_reject_icmpv6_code+0xed0/0xed0 [nft_reject
0e902f0803e5bbdfadf527319d5a2d5ea2df373c]
[ 13.127775] do_one_initcall+0x6e/0x3a0
[ 13.127782] do_init_module+0x60/0x230
[ 13.127787] init_module_from_file+0xc4/0xe0
[ 13.127794] idempotent_init_module+0x11a/0x310
[ 13.127801] __x64_sys_finit_module+0x71/0xe0
[ 13.127806] do_syscall_64+0x122/0x710
[ 13.127812] ? __seccomp_filter+0x42/0x5d0
[ 13.127819] ? do_syscall_64+0xd1/0x710
[ 13.127824] entry_SYSCALL_64_after_hwframe+0x4b/0x53
[ 13.127830] RIP: 0033:0x7f9e10860f6d
[ 13.127835] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48
89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 73 01 c3 48 8b 0d 73 ed 0c 00 f7 d8 64 89 01 48
[ 13.127844] RSP: 002b:00007ffc537eaed8 EFLAGS: 00000246 ORIG_RAX:
0000000000000139
[ 13.127851] RAX: ffffffffffffffda RBX: 000056344d23c3f0 RCX: 00007f9e10860f6d
[ 13.127855] RDX: 0000000000000000 RSI: 000056344d23dc60 RDI: 000000000000003a
[ 13.127859] RBP: 00007ffc537eaf70 R08: 0000000000000000 R09: 00007ffc537eaf40
[ 13.127863] R10: 0000000000000000 R11: 0000000000000246 R12: 000056344d23dc60
[ 13.127867] R13: 0000000000020000 R14: 000056344d23ab40 R15: 0000000000000000
[ 13.127874] </TASK>
[ 13.127877] irq event stamp: 603391
[ 13.127880] hardirqs last enabled at (603391): [<ffffffffa8f105cc>]
_raw_spin_unlock_irqrestore+0x4c/0x60
[ 13.127887] hardirqs last disabled at (603390): [<ffffffffa8f10343>]
_raw_spin_lock_irqsave+0x53/0x60
[ 13.127892] softirqs last enabled at (601682): [<ffffffffa8103402>]
__irq_exit_rcu+0xf2/0x190
[ 13.127900] softirqs last disabled at (601671): [<ffffffffa8103402>]
__irq_exit_rcu+0xf2/0x190
[ 13.127906] ---[ end trace 0000000000000000 ]---
[ 13.127977] amdgpu: Virtual CRAT table created for GPU
[ 13.129101] amdgpu: Topology: Add GPU node [0x1002:0x163f]
[ 13.129117] kfd kfd: added device 1002:163f
Fixes: a789761de305 ("amd/amdkfd: Add kfd_ioctl_profiler to contain profiler kernel
driver changes")
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
---
drivers/gpu/drm/amd/amdkfd/kfd_device.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index c2c59781feee..8b2039bcbc4d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -736,6 +736,8 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
int partition_mode;
int xcp_idx;
+ mutex_init(&kfd->profiler_lock);
+
kfd->mec_fw_version = amdgpu_amdkfd_get_fw_version(kfd->adev,
KGD_ENGINE_MEC1);
kfd->mec2_fw_version = amdgpu_amdkfd_get_fw_version(kfd->adev,
@@ -937,7 +939,6 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
svm_range_set_max_pages(kfd->adev);
kfd->profiler_process = NULL;
- mutex_init(&kfd->profiler_lock);
kfd->init_complete = true;
dev_info(kfd_device, "added device %x:%x\n", kfd->adev->pdev->vendor,
---
base-commit: 17cdb54644e7d92b62cff1c4d1bd3d1486515f68
change-id: 20260604-amdgpu-mutex-fix-73636d10f6a7
Best regards,
--
Thadeu Lima de Souza Cascardo <[email protected]>