Control: tags -1 + moreinfo Hi,
On Sun, Jan 28, 2024 at 06:02:44PM +0000, Breno Leitao wrote: > Package: src:linux > Version: 6.6.13-1 > Severity: critical > X-Debbugs-Cc: lei...@debian.org > > > System is crashing from time to time with the most recent kernel > (6.6.13). > > I was able to get the last kernel messages, and it is related to > dma-iommu. I am not sure why the system is crashing, since I didn't have > kdump, but, there is a clear warning in the wifi driver. > > > Jan 28 17:05:21.414052 xeon kernel: Process accounting resumed > Jan 28 17:05:21.530027 xeon kernel: warning: `atop' uses wireless > extensions which will stop working for Wi-Fi 7 hardware; use nl80211 > Jan 28 17:05:21.550117 xeon kernel: espeakup[1527]: memfd_create() > called without MFD_EXEC or MFD_NOEXEC_SEAL set > Jan 28 17:05:21.586066 xeon kernel: NET: Registered PF_QIPCRTR protocol > family > Jan 28 17:05:21.606106 xeon kernel: block nvme3n1: No UUID available > providing old NGUID > Jan 28 17:05:25.354046 xeon kernel: rfkill: input handler disabled > Jan 28 17:05:27.294079 xeon kernel: wlp134s0: authenticate with > 80:72:15:b4:aa:6d > Jan 28 17:05:27.694105 xeon kernel: wlp134s0: send auth to > 80:72:15:b4:aa:6d (try 1/3) > Jan 28 17:05:27.702030 xeon kernel: wlp134s0: authenticated > Jan 28 17:05:27.710089 xeon kernel: wlp134s0: associate with > 80:72:15:b4:aa:6d (try 1/3) > Jan 28 17:05:27.718085 xeon kernel: wlp134s0: RX AssocResp from > 80:72:15:b4:aa:6d (capab=0x1011 status=0 aid=6) > Jan 28 17:05:27.718145 xeon kernel: wlp134s0: associated > Jan 28 17:05:27.730080 xeon kernel: wlp134s0: Limiting TX power to 23 > (23 - 0) dBm as advertised by 80:72:15:b4:aa:6d > Jan 28 17:05:31.799817 xeon systemd-journald[764]: > /var/log/journal/338f646113274ac1b9a4e000c0f8c95c/user-1000.journal: Journal > file uses a different sequence number ID, rotating. > Jan 28 17:05:32.074056 xeon kernel: rfkill: input handler enabled > Jan 28 17:05:33.862039 xeon kernel: rfkill: input handler disabled > Jan 28 17:05:45.370136 xeon kernel: logitech-hidpp-device > 0003:046D:406D.0008: HID++ 4.5 device connected. > Jan 28 17:40:26.302054 xeon kernel: rtlwifi: AP off, try to reconnect > now > Jan 28 17:40:26.302220 xeon kernel: wlp134s0: Connection to AP > 80:72:15:b4:aa:6d lost > Jan 28 17:40:30.661645 xeon kernel: wlp134s0: authenticate with > 80:72:15:b4:aa:6a > Jan 28 17:40:30.661756 xeon kernel: wlp134s0: 80 MHz not supported, > disabling VHT > Jan 28 17:40:30.680151 xeon kernel: wlp134s0: send auth to > 80:72:15:b4:aa:6a (try 1/3) > Jan 28 17:40:30.686063 xeon kernel: wlp134s0: authenticated > Jan 28 17:40:30.686136 xeon kernel: wlp134s0: associate with > 80:72:15:b4:aa:6a (try 1/3) > Jan 28 17:40:30.696593 xeon kernel: wlp134s0: RX AssocResp from > 80:72:15:b4:aa:6a (capab=0x1411 status=0 aid=2) > Jan 28 17:40:30.702085 xeon kernel: wlp134s0: associated > Jan 28 17:40:36.710058 xeon kernel: wlp134s0: deauthenticated from > 80:72:15:b4:aa:6a (Reason: 2=PREV_AUTH_NOT_VALID) > Jan 28 17:42:36.946218 xeon kernel: ------------[ cut here ]------------ > Jan 28 17:42:36.946357 xeon kernel: WARNING: CPU: 37 PID: 1366 at > drivers/iommu/dma-iommu.c:1091 iommu_dma_unmap_page+0x7d/0x90 > Jan 28 17:42:36.946403 xeon kernel: Modules linked in: ccm > snd_seq_dummy snd_hrtimer snd_seq snd_seq_device qrtr binfmt_misc > intel_rapl_msr intel_rapl_common intel_uncore_frequency > intel_uncore_frequency_common isst_if_common skx_edac nfit libnvdimm > x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm rtl8821ae > btcoexist irqbypass rtl_pci ghash_clmulni_intel rtlwifi sha512_ssse3 > sha256_ssse3 mac80211 sha1_ssse3 snd_hda_codec_realtek aesni_intel > snd_hda_codec_generic crypto_simd cryptd snd_hda_codec_hdmi ledtrig_audio > libarc4 snd_hda_intel rapl snd_intel_dspcfg snd_intel_sdw_acpi intel_cstate > snd_hda_codec cfg80211 snd_hda_core snd_hwdep iTCO_wdt snd_pcm rfkill > snd_timer intel_pmc_bxt mei_me intel_uncore iTCO_vendor_support snd pcspkr > ioatdma mei watchdog soundcore intel_pch_thermal dca joydev acpi_pad > acpi_power_meter sg evdev msr parport_pc ppdev lp parport loop nvme_fabrics > dm_mod efi_pstore configfs nfnetlink ip_tables x_tables autofs4 ext4 crc16 > mbcache jbd2 crc32c_generic speakup_soft speakup hid_logitech_hidpp > hid_logitech_dj > Jan 28 17:42:36.946556 xeon kernel: nouveau sr_mod hid_generic sd_mod > cdrom usbhid drm_exec hid gpu_sched video nvme i2c_algo_bit > drm_display_helper nvme_core cec t10_pi rc_core drm_ttm_helper ahci ttm > xhci_pci crc64_rocksoft drm_kms_helper crc64 libahci xhci_hcd crc_t10dif > libata crct10dif_generic drm mxm_wmi usbcore crc32_pclmul scsi_mod > crct10dif_pclmul i2c_i801 crc32c_intel crct10dif_common lpc_ich vmd i2c_smbus > usb_common scsi_common wmi button > Jan 28 17:42:36.946611 xeon kernel: CPU: 37 PID: 1366 Comm: > NetworkManager Not tainted 6.6.13-amd64 #1 Debian 6.6.13-1 > Jan 28 17:42:36.946648 xeon kernel: Hardware name: ASUSTeK COMPUTER > INC. WS-C621E-SAGE Series/WS-C621E-SAGE Series, BIOS 6801 04/26/2022 > Jan 28 17:42:36.946685 xeon kernel: RIP: > 0010:iommu_dma_unmap_page+0x7d/0x90 > Jan 28 17:42:36.946721 xeon kernel: Code: 2b 48 3b 28 72 26 48 3b 68 08 > 73 20 4d 89 f8 44 89 f1 4c 89 ea 48 89 ee 48 89 df 5b 5d 41 5c 41 5d 41 5e 41 > 5f e9 83 ed 8f ff <0f> 0b 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 66 90 > 90 90 90 > Jan 28 17:42:36.946761 xeon kernel: RSP: 0018:ffffb034a5cc7440 EFLAGS: > 00010046 > Jan 28 17:42:36.946796 xeon kernel: RAX: 0000000000000000 RBX: > ffff90a2e01070c0 RCX: 0000000000000012 > Jan 28 17:42:36.946832 xeon kernel: RDX: 0000000000000000 RSI: > ffff90ba5569b000 RDI: 0000000000000000 > Jan 28 17:42:36.946868 xeon kernel: RBP: ffff90ba50140900 R08: > 0000000000000000 R09: 0000000000000003 > Jan 28 17:42:36.946903 xeon kernel: R10: 0000000000000000 R11: > 0000000000000000 R12: 0000000000000000 > Jan 28 17:42:36.947204 xeon kernel: R13: 00000000000009d8 R14: > 0000000000000001 R15: 0000000000000000 > Jan 28 17:42:36.947377 xeon kernel: FS: 00007f5766591500(0000) > GS:ffff90d1dfc40000(0000) knlGS:0000000000000000 > Jan 28 17:42:36.947416 xeon kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > Jan 28 17:42:36.947452 xeon kernel: CR2: 00007fa83488e000 CR3: > 000000010f5e0001 CR4: 00000000007726e0 > Jan 28 17:42:36.947487 xeon kernel: PKRU: 55555554 > Jan 28 17:42:36.947522 xeon kernel: Call Trace: > Jan 28 17:42:36.947557 xeon kernel: <TASK> > Jan 28 17:42:36.947592 xeon kernel: ? iommu_dma_unmap_page+0x7d/0x90 > Jan 28 17:42:36.947627 xeon kernel: ? __warn+0x81/0x130 > Jan 28 17:42:36.947658 xeon kernel: ? iommu_dma_unmap_page+0x7d/0x90 > Jan 28 17:42:36.947688 xeon kernel: ? report_bug+0x171/0x1a0 > Jan 28 17:42:36.947723 xeon kernel: ? handle_bug+0x3c/0x80 > Jan 28 17:42:36.947758 xeon kernel: ? exc_invalid_op+0x17/0x70 > Jan 28 17:42:36.947799 xeon kernel: ? asm_exc_invalid_op+0x1a/0x20 > Jan 28 17:42:36.947836 xeon kernel: ? iommu_dma_unmap_page+0x7d/0x90 > Jan 28 17:42:36.947866 xeon kernel: rtl_pci_reset_trx_ring+0x195/0x390 > [rtl_pci] > Jan 28 17:42:36.947901 xeon kernel: rtl_ps_enable_nic+0x29/0x120 > [rtlwifi] > Jan 28 17:42:36.947936 xeon kernel: > rtl8821ae_phy_set_rf_power_state+0x71/0x2d0 [rtl8821ae] > Jan 28 17:42:36.947967 xeon kernel: > rtl_ps_set_rf_state.isra.0+0xbb/0xf0 [rtlwifi] > Jan 28 17:42:36.948002 xeon kernel: _rtl_ps_inactive_ps+0x36/0xd0 > [rtlwifi] > Jan 28 17:42:36.948032 xeon kernel: rtl_ips_nic_on+0x7c/0xc0 [rtlwifi] > Jan 28 17:42:36.948068 xeon kernel: rtl_op_stop+0xfd/0x110 [rtlwifi] > Jan 28 17:42:36.948108 xeon kernel: drv_stop+0x34/0x100 [mac80211] > Jan 28 17:42:36.948143 xeon kernel: ieee80211_do_stop+0x5df/0x8a0 > [mac80211] > Jan 28 17:42:36.948177 xeon kernel: ieee80211_stop+0x4d/0x180 > [mac80211] > Jan 28 17:42:36.948212 xeon kernel: __dev_close_many+0x9b/0x110 > Jan 28 17:42:36.948242 xeon kernel: __dev_change_flags+0x1a6/0x240 > Jan 28 17:42:36.948276 xeon kernel: dev_change_flags+0x26/0x70 > Jan 28 17:42:36.948311 xeon kernel: do_setlink+0x39c/0x12d0 > Jan 28 17:42:36.948341 xeon kernel: ? > intel_iommu_iotlb_sync_map+0x8d/0xe0 > Jan 28 17:42:36.948371 xeon kernel: ? __nla_validate_parse+0x61/0xd10 > Jan 28 17:42:36.948401 xeon kernel: ? update_load_avg+0x7e/0x780 > Jan 28 17:42:36.948436 xeon kernel: __rtnl_newlink+0x651/0xa10 > Jan 28 17:42:36.948470 xeon kernel: ? sched_clock+0x10/0x30 > Jan 28 17:42:36.948507 xeon kernel: ? > __kmem_cache_alloc_node+0x196/0x330 > Jan 28 17:42:36.948542 xeon kernel: ? rtnl_newlink+0x2e/0x70 > Jan 28 17:42:36.948577 xeon kernel: rtnl_newlink+0x47/0x70 > Jan 28 17:42:36.948611 xeon kernel: rtnetlink_rcv_msg+0x14f/0x3c0 > Jan 28 17:42:36.948645 xeon kernel: ? path_lookupat+0x96/0x1a0 > Jan 28 17:42:36.948680 xeon kernel: ? __pfx_rtnetlink_rcv_msg+0x10/0x10 > Jan 28 17:42:36.948715 xeon kernel: netlink_rcv_skb+0x58/0x110 > Jan 28 17:42:36.948745 xeon kernel: netlink_unicast+0x1a3/0x290 > Jan 28 17:42:36.948776 xeon kernel: netlink_sendmsg+0x254/0x4d0 > Jan 28 17:42:36.948806 xeon kernel: ____sys_sendmsg+0x396/0x3d0 > Jan 28 17:42:36.948841 xeon kernel: ? copy_msghdr_from_user+0x7d/0xc0 > Jan 28 17:42:36.948871 xeon kernel: ___sys_sendmsg+0x9a/0xe0 > Jan 28 17:42:36.948905 xeon kernel: __sys_sendmsg+0x7a/0xd0 > Jan 28 17:42:36.948941 xeon kernel: do_syscall_64+0x5d/0xc0 > Jan 28 17:42:36.948975 xeon kernel: ? > syscall_exit_to_user_mode+0x2b/0x40 > Jan 28 17:42:36.949009 xeon kernel: ? do_syscall_64+0x6c/0xc0 > Jan 28 17:42:36.949043 xeon kernel: ? do_syscall_64+0x6c/0xc0 > Jan 28 17:42:36.949073 xeon kernel: ? __fget_light+0x99/0x100 > Jan 28 17:42:36.949107 xeon kernel: ? ksys_write+0xd8/0xf0 > Jan 28 17:42:36.949137 xeon kernel: ? > exit_to_user_mode_prepare+0x40/0x1e0 > Jan 28 17:42:36.949171 xeon kernel: ? > syscall_exit_to_user_mode+0x2b/0x40 > Jan 28 17:42:36.949201 xeon kernel: ? do_syscall_64+0x6c/0xc0 > Jan 28 17:42:36.949230 xeon kernel: ? > exit_to_user_mode_prepare+0x40/0x1e0 > Jan 28 17:42:36.949260 xeon kernel: ? > syscall_exit_to_user_mode+0x2b/0x40 > Jan 28 17:42:36.949290 xeon kernel: ? do_syscall_64+0x6c/0xc0 > Jan 28 17:42:36.949319 xeon kernel: ? do_syscall_64+0x6c/0xc0 > Jan 28 17:42:36.949348 xeon kernel: > entry_SYSCALL_64_after_hwframe+0x6e/0xd8 > Jan 28 17:42:36.949384 xeon kernel: RIP: 0033:0x7f576788ba5d > Jan 28 17:42:36.949420 xeon kernel: Code: 28 89 54 24 1c 48 89 74 24 10 > 89 7c 24 08 e8 1a a0 f7 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 > 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 6e a0 > f7 ff 48 > Jan 28 17:42:36.949448 xeon kernel: RSP: 002b:00007ffdd6e2ece0 EFLAGS: > 00000293 ORIG_RAX: 000000000000002e > Jan 28 17:42:36.949483 xeon kernel: RAX: ffffffffffffffda RBX: > 000055d226a1c3f0 RCX: 00007f576788ba5d > Jan 28 17:42:36.949517 xeon kernel: RDX: 0000000000000000 RSI: > 00007ffdd6e2ed30 RDI: 000000000000000d > Jan 28 17:42:36.949551 xeon kernel: RBP: 00007ffdd6e2ed30 R08: > 0000000000000000 R09: 0000000000000000 > Jan 28 17:42:36.949586 xeon kernel: R10: 0000000000000000 R11: > 0000000000000293 R12: 0000000000000040 > Jan 28 17:42:36.949616 xeon kernel: R13: 000055d226a9f3a0 R14: > 0000000000000000 R15: 0000000000000000 > Jan 28 17:42:36.949650 xeon kernel: </TASK> > Jan 28 17:42:36.949684 xeon kernel: ---[ end trace 0000000000000000 ]--- Can you check if this happens as well with 6.7.1-1~exp1 in experimental? As I understand this is a regression from 6.6.11-1 to 6.6.13-1, any chance you could bisect the upstream versions between 6.6.11 and 6.6.13 to identify the culprit? Regards, Salvatore