On Thu, Dec 3, 2020 at 1:18 AM Marc Smith <[email protected]> wrote: > > Hi, > > First, I must preface this email by apologizing in advance for asking > about a distro kernel (RHEL in this case); so not truly reporting this > problem and requesting a fix here (I know this should be taken up with > the vendor), rather hoping someone can give me a few hints/pointers on > where to look next for debugging this issue. > > I'm using RHEL 7.8.2003 (CentOS) with a 3.10.0-1127.18.2.el7 kernel. > The systems use a Supermicro H12SSW-NT board (AMD), and we have the > IOMMU enabled along with SR-IOV. I have several virtual machines (QEMU > KVM) that run on these servers, and I'm passing PCIe end-points into > the VMs (in some cases the whole PCIe EP itself, and for some devices > I use SR-IOV and pass in the VFs to the VMs). The VM's run Linux as > their guest OS (a couple different distros). > > While the servers (VMs) are idle, I don't experience any problems. But > when I start doing a lot of I/O in the virtual machines (iSCSI across > Ethernet interfaces, disk I/O via SAS HBAs that are passed into the > VM, etc.) I notice the following after some time at the host layer > ("hypervisor"): > Nov 29 10:50:00 node1 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT > device=42:00.0 domain=0x005e address=0xfffffffdf8030000 flags=0x0008] > Nov 29 22:02:03 node1 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT > device=c8:02.1 domain=0x005f address=0xfffffffdf8060000 flags=0x0008] > Nov 30 02:13:54 node1 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT > device=42:00.0 domain=0x005e address=0xfffffffdf8020000 flags=0x0008] > Nov 30 02:28:44 node1 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT > device=c8:02.0 domain=0x005e address=0xfffffffdf8020000 flags=0x0008] > Nov 30 10:48:53 node1 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT > device=01:00.0 domain=0x005e address=0xfffffffdf8040000 flags=0x0008] > Dec 2 07:05:22 node1 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT > device=c8:03.0 domain=0x005e address=0xfffffffdf8010000 flags=0x0008] > > These events happen to all PCIe devices that are passed into the VMs, > although not all at once... as you can see on the timestamps above, > they are not very frequent when under heavy load (in the log snippet > above, the system was doing a big workload over several days). For the > Ethernet devices that are passed into the VMs, I noticed that they > experience transmit hangs / resets in the virtual machines, and when > these occur, they correspond to a matching IO_PAGE_FAULT that belongs > to that PCI device. > > FWIW, those NIC hangs look like this (visible in the VM guest OS): > [17879.279091] NETDEV WATCHDOG: s1p1 (bnxt_en): transmit queue 2 timed out > [17879.279111] WARNING: CPU: 5 PID: 0 at net/sched/sch_generic.c:447 > dev_watchdog+0x121/0x17e > ... > [17879.279213] bnxt_en 0000:01:09.0 s1p1: TX timeout detected, > starting reset task! > [17883.075299] bnxt_en 0000:01:09.0 s1p1: Resp cmpl intr err msg: 0x51 > [17883.075302] bnxt_en 0000:01:09.0 s1p1: hwrm_ring_free type 1 > failed. rc:fffffff0 err:0 > [17886.957100] bnxt_en 0000:01:09.0 s1p1: Resp cmpl intr err msg: 0x51 > [17886.957103] bnxt_en 0000:01:09.0 s1p1: hwrm_ring_free type 2 > failed. rc:fffffff0 err:0 > [17890.843023] bnxt_en 0000:01:09.0 s1p1: Resp cmpl intr err msg: 0x51 > [17890.843025] bnxt_en 0000:01:09.0 s1p1: hwrm_ring_free type 2 > failed. rc:fffffff0 err:0 > > We see these NIC hangs in the VMs occur with both Broadcom and > Mellanox Ethernet adapters that are passed into the VMs, so I don't > think it's the NICs causing the IO_PAGE_FAULT events observed in the > hypervisor. Plus we see IO_PAGE_FAULT's for devices other than > Ethernet adapters. > > > I have several of these same servers (all using the same motherboard, > processor, memory, BIOS, etc.) and they all experience this behavior > with the IO_PAGE_FAULT events, so I don't believe it to be any one > faulty server / component. I guess my question is I'm not sure where > to dig/push next. Is this perhaps an issue with the BIOS/firmware on > these motherboards? Something with the chipset (AMD IOMMU)? A > colleague has suggested that even the AGESA may be related. Or should > I be focusing on the Linux kernel, the AMD IOMMU driver (software)? > > I've been poking around other similar bug reports, and I see the > IO_PAGE_FAULT and NIC reset / transmit hang seem to be related in > other posts. This commit looked promising: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4e50ce03976fbc8ae995a000c4b10c737467beaa > > But I see RH has already back-ported it into their > 3.10.0-1127.18.2.el7 kernel source. I'm open to trying a newer Linux > vanilla kernel (eg, 5.4.x) but would prefer to resolve this in the > RHEL kernel I'm using now. I'll take a look at this next, although due > to the complex nature of this hypervisor/VM setup, it's a bit tedious > to test. > > > Kernel messages from boot (using the amd_iommu_dump=1 parameter): > ... > [ 0.214395] AMD-Vi: Using IVHD type 0x11 > [ 0.214627] AMD-Vi: device: c0:00.2 cap: 0040 seg: 0 flags: b0 info 0000 > [ 0.214628] AMD-Vi: mmio-addr: 00000000f3700000 > [ 0.214634] AMD-Vi: DEV_SELECT_RANGE_START devid: c0:01.0 flags: 00 > [ 0.214635] AMD-Vi: DEV_RANGE_END devid: ff:1f.6 > [ 0.214763] AMD-Vi: DEV_SPECIAL(IOAPIC[241]) devid: c0:00.1 > [ 0.214765] AMD-Vi: device: 80:00.2 cap: 0040 seg: 0 flags: b0 info 0000 > [ 0.214766] AMD-Vi: mmio-addr: 00000000f2600000 > [ 0.214771] AMD-Vi: DEV_SELECT_RANGE_START devid: 80:01.0 flags: 00 > [ 0.214772] AMD-Vi: DEV_RANGE_END devid: bf:1f.6 > [ 0.214900] AMD-Vi: DEV_SPECIAL(IOAPIC[242]) devid: 80:00.1 > [ 0.214901] AMD-Vi: device: 40:00.2 cap: 0040 seg: 0 flags: b0 info 0000 > [ 0.214902] AMD-Vi: mmio-addr: 00000000b4800000 > [ 0.214906] AMD-Vi: DEV_SELECT_RANGE_START devid: 40:01.0 flags: 00 > [ 0.214907] AMD-Vi: DEV_RANGE_END devid: 7f:1f.6 > [ 0.215036] AMD-Vi: DEV_SPECIAL(IOAPIC[243]) devid: 40:00.1 > [ 0.215037] AMD-Vi: device: 00:00.2 cap: 0040 seg: 0 flags: b0 info 0000 > [ 0.215038] AMD-Vi: mmio-addr: 00000000fc800000 > [ 0.215044] AMD-Vi: DEV_SELECT_RANGE_START devid: 00:01.0 flags: 00 > [ 0.215045] AMD-Vi: DEV_RANGE_END devid: 3f:1f.6 > [ 0.215173] AMD-Vi: DEV_ALIAS_RANGE devid: > ff:00.0 flags: 00 devid_to: 00:14.4 > [ 0.215174] AMD-Vi: DEV_RANGE_END devid: ff:1f.7 > [ 0.215179] AMD-Vi: DEV_SPECIAL(HPET[0]) devid: 00:14.0 > [ 0.215180] AMD-Vi: DEV_SPECIAL(IOAPIC[240]) devid: 00:14.0 > [ 0.215181] AMD-Vi: DEV_SPECIAL(IOAPIC[244]) devid: 00:00.1 > ... > [ 4.345723] AMD-Vi: Found IOMMU at 0000:c0:00.2 cap 0x40 > [ 4.345724] AMD-Vi: Extended features (0x58f77ef22294ade): > [ 4.345724] PPR X2APIC NX GT IA GA PC GA_vAPIC > [ 4.345728] AMD-Vi: Found IOMMU at 0000:80:00.2 cap 0x40 > [ 4.345729] AMD-Vi: Extended features (0x58f77ef22294ade): > [ 4.345729] PPR X2APIC NX GT IA GA PC GA_vAPIC > [ 4.345731] AMD-Vi: Found IOMMU at 0000:40:00.2 cap 0x40 > [ 4.345732] AMD-Vi: Extended features (0x58f77ef22294ade): > [ 4.345733] PPR X2APIC NX GT IA GA PC GA_vAPIC > [ 4.345735] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40 > [ 4.345735] AMD-Vi: Extended features (0x58f77ef22294ade): > [ 4.345736] PPR X2APIC NX GT IA GA PC GA_vAPIC > [ 4.345737] AMD-Vi: Interrupt remapping enabled > [ 4.345738] AMD-Vi: virtual APIC enabled > [ 4.345739] AMD-Vi: X2APIC enabled > [ 4.345805] pci 0000:c0:00.2: irq 26 for MSI/MSI-X > [ 4.345947] pci 0000:80:00.2: irq 27 for MSI/MSI-X > [ 4.346073] pci 0000:40:00.2: irq 28 for MSI/MSI-X > [ 4.346208] pci 0000:00:00.2: irq 29 for MSI/MSI-X > [ 4.346305] AMD-Vi: IO/TLB flush on unmap enabled > ... > > I have also tried using 'amd_iommu=fullflush' (as denoted in the > kernel message above) on a hunch after reviewing other user's posts > with similar IO_PAGE_FAULT events, but this doesn't seem to change > anything -- the events still occur with or without this kernel > parameter. > > So, any guidance/tips/advice on how to tackle this would be greatly > appreciated. Thank you for your consideration and time!
I booted the systems with "amd_iommu_intr=legacy" and the problem went away! No more IO_PAGE_FAULT's in the hypervisor, and no NIC hangs/resets in the virtual machines! No noticeable degradation of I/O performance either. Confirmed on two systems. --Marc > > > --Marc _______________________________________________ iommu mailing list [email protected] https://lists.linuxfoundation.org/mailman/listinfo/iommu
