Dear IOMMU Subsystem Maintainers, I have been debugging an issue with Nathan Langford, CC here, for some months now, along with Alex Williamson on the linux-pci mailing list, and I just wanted to check that we aren't also running into an IOMMU bug when enabling IRQ remapping in the crashkernel.
Nathan has a system with 8x 2080TI graphics cards, and we are passing through multiple GPUs to a KVM VM via vfio-pci. When we pass through 2x GPUs that share the same upstream PCI switch, and reboot the VM a handful of times, an IRQ storm occurs, and locks up the host system. System Information: - SuperMicro X9DRG-O(T)F - 8x Nvidia GeForce RTX 2080 Ti GPUs - Ubuntu 20.04 LTS - 5.14.0 mainline kernel - libvirt 6.0.0-0ubuntu8.10 - qemu 4.2-3ubuntu6.16 - intel_iommu=on In the logs we see: irq 31: nobody cared (try booting with the "irqpoll" option) Call Trace: <IRQ> dump_stack_lvl+0x4a/0x5f dump_stack+0x10/0x12 __report_bad_irq+0x3a/0xaf note_interrupt.cold+0xb/0x60 handle_irq_event_percpu+0x72/0x80 handle_irq_event+0x3b/0x60 handle_fasteoi_irq+0x9c/0x150 __common_interrupt+0x4b/0xb0 common_interrupt+0x4a/0xa0 asm_common_interrupt+0x1e/0x40 RIP: 0010:__do_softirq+0x73/0x2ae handlers: [<00000000b16da31d>] vfio_intx_handler Disabling IRQ #31 Extra details on LKML / linux-pci: https://lkml.org/lkml/2021/9/13/85 Now, Nathan has "kernel.hardlockup_panic = 1" set, which causes the kernel to panic, and reboot to the crashkernel, and this is where the IOMMU issues begin. The crashkernel loads, and gets as far as: DMAR: Host address width 46 DMAR: DRHD base: 0x000000fbffe000 flags: 0x0 DMAR: dmar0: reg_base_addr fbffe000 ver 1:0 cap d2078c106f0466 ecap f020de DMAR: DRHD base: 0x000000cbffc000 flags: 0x1 DMAR: dmar1: reg_base_addr cbffc000 ver 1:0 cap d2078c106f0466 ecap f020de DMAR: RMRR base: 0x0000005f21a000 end: 0x0000005f228fff DMAR: ATSR flags: 0x0 DMAR: RHSA base: 0x000000fbffe000 proximity domain: 0x1 DMAR: RHSA base: 0x000000cbffc000 proximity domain: 0x0 DMAR-IR: IOAPIC id 3 under DRHD base 0xfbffe000 IOMMU 0 DMAR-IR: IOAPIC id 0 under DRHD base 0xcbffc000 IOMMU 1 DMAR-IR: IOAPIC id 2 under DRHD base 0xcbffc000 IOMMU 1 DMAR-IR: HPET id 0 under DRHD base 0xcbffc000 [ 3.271530] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping. [ 3.282572] DMAR-IR: Copied IR table for dmar0 from previous kernel [ 13.291319] DMAR-IR: Copied IR table for dmar1 from previous kernel I added the timestamps for the last couple entries. There is a ten second hang between copying the IR table from dmar0 and copying the IR table from dmar1. After this, the kernel just hangs, and the system has to be hard rebooted. Full dmesg: https://paste.ubuntu.com/p/M7Bdyk9YV7/ We never see the next message that usually happens with plain old sysrq-trigger, which is: DMAR-IR: Enabled IRQ remapping in x2apic mode Would an ongoing IRQ storm prevent IRQ remapping being enabled? >From my understanding, when we start the crashkernel, PCI devices are in an undefined state, and could keep on sending DMA or IRQ requests to the crashkernel, which could break things through data corruption or causing IRQs to be blocked if we get too many spurious IRQs. This would then cause problems if we try and re-initialise these PCI devices and they have IRQs blocked. Which is why we copy the old IR tables from dmar regions, and unblock blocked IRQs. But if an IRQ storm is ongoing, is there anything we can really do? Is it a bug to just hang here, or is it an indication that the system administrator needs to go and do a full hardware reset? Please let us know if you need any additional debugging information, we can build patched kernels if you need extra debug output. Thanks, Matthew _______________________________________________ iommu mailing list [email protected] https://lists.linuxfoundation.org/mailman/listinfo/iommu
