Hi Alex and Michael >> For testing, I applied the following patch to qemu, >> converting msix bar to 64 bit. >> Guest did not seem to crash. >> I booted Fedora Live CD 32 bit guest on a 32 bit host >> to level 3 without crash, and verified that >> the BAR is a 64 bit one, and that I got assigned an address >> at fe000000. >> command line I used: >> qemu-system-x86_64 -bios /scm/seabios/out/bios.bin -snapshot -drive >> file=qemu-images/f15-test.qcow2,if=none,id=diskid,cache=unsafe >> -device virtio-blk-pci,drive=diskid -net user -net nic,model=ne2k_pci >> -cdrom Fedora-15-i686-Live-LXDE.iso >> >> At boot prompt type tab and add '3' to kernel command line >> to have guest boot into a fast text console instead >> of a graphical one which is very slow. >> >> diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c >> index 2ac87ea..5271394 100644 >> --- a/hw/virtio-pci.c >> +++ b/hw/virtio-pci.c >> @@ -711,7 +711,8 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice >> *vdev) >> memory_region_init(&proxy->msix_bar, "virtio-msix", 4096); >> if (vdev->nvectors && !msix_init(&proxy->pci_dev, vdev->nvectors, >> &proxy->msix_bar, 1, 0)) { >> - pci_register_bar(&proxy->pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY, >> + pci_register_bar(&proxy->pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY | >> + PCI_BASE_ADDRESS_MEM_TYPE_64, >> &proxy->msix_bar); >> } else >> vdev->nvectors = 0; >> > I was also able to add MEM64 BARs to device assignment pretty trivially > and it seems to work, guest sees 64bit BARs for an 82576 VF, programs it > to an fexxxxxx address and it works. > > Alex >
I'd suggest using ivshmem with buffer size 32MB to reproduce the problem in 2.6.18 guest for example. The msix case is not failing because: 1. Buffer size is just 4KB - it will reprogram range from 0xFFFFE000-0xFFFFFFFF (it doesn't overlap critical resources to cause immediate panic) 2. The memory_region_init -function doesn't create backing user memory region. So kvm does nothing about remapping in this case. If you apply the following patch and add to qemu command: --device ivshmem,size=32,shm="shm" --- diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 1aa9e3b..71f8c21 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -341,7 +341,7 @@ static void create_shared_memory_BAR(IVShmemState *s, int fd) { memory_region_add_subregion(&s->bar, 0, &s->ivshmem); /* region for shared memory */ - pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->bar); + pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY|PCI_BASE_ADDRESS_MEM_TYPE_64, &s->bar) } static void close_guest_eventfds(IVShmemState *s, int posn) --- You can get the following bootup log: Bootdata ok (command line is root=/dev/hda1 console=ttyS0,115200n8 console=tty0) Linux version 2.6.18 (root@localhost.localdomain) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #3 SMP Tue Jan 17 16:37:33 NZDT 2012 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009f400 (usable) BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved) BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000007fffd000 (usable) BIOS-e820: 000000007fffd000 - 0000000080000000 (reserved) BIOS-e820: 00000000feffc000 - 00000000ff000000 (reserved) BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved) DMI 2.4 present. No NUMA configuration found Faking a node at 0000000000000000-000000007fffd000 Bootmem setup node 0 0000000000000000-000000007fffd000 ACPI: PM-Timer IO Port: 0xb008 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 6:2 APIC version 17 ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 1, version 17, address 0xfec00000, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level) ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level) Setting APIC routing to physical flat ACPI: HPET id: 0x8086a201 base: 0xfed00000 Using ACPI (MADT) for SMP configuration information Allocating PCI resources starting at 88000000 (gap: 80000000:7effc000) SMP: Allowing 1 CPUs, 0 hotplug CPUs Built 1 zonelists. Total pages: 515393 Kernel command line: root=/dev/hda1 console=ttyS0,115200n8 console=tty0 Initializing CPU#0 PID hash table entries: 4096 (order: 12, 32768 bytes) time.c: Using 100.000000 MHz WALL HPET GTOD HPET/TSC timer. time.c: Detected 2500.081 MHz processor. Console: colour VGA+ 80x25 Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes) Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes) Checking aperture... Memory: 2058096k/2097140k available (3256k kernel code, 38656k reserved, 2266k data, 204k init) Calibrating delay using timer specific routine.. 5030.07 BogoMIPS (lpj=10060155) Mount-cache hash table entries: 256 CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 4096K MCE: warning: using only 10 banks SMP alternatives: switching to UP code Freeing SMP alternatives: 36k freed ACPI: Core revision 20060707 activating NMI Watchdog ... done. Using local APIC timer interrupts. result 62501506 Detected 62.501 MHz APIC timer. Brought up 1 CPUs testing NMI watchdog ... OK. migration_cost=0 NET: Registered protocol family 16 ACPI: bus type pci registered PCI: Using configuration type 1 ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (0000:00) ACPI: Assume root bridge [\_SB_.PCI0] bus is 0 PCI quirk: region b000-b03f claimed by PIIX4 ACPI PCI quirk: region b100-b10f claimed by PIIX4 SMB ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11) ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11) ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11) ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11) ACPI: PCI Interrupt Link [LNKS] (IRQs 9) *0, disabled. SCSI subsystem initialized usbcore: registered new driver usbfs usbcore: registered new driver hub PCI: Using ACPI for IRQ routing PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report divide error: 0000 [1] SMP CPU 0 Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.18 #3 RIP: 0010:[<ffffffff80388299>] [<ffffffff80388299>] hpet_alloc+0x12a/0x30c RSP: 0000:ffff81007e3a1e20 EFLAGS: 00010246 RAX: 00038d7ea4c68000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8057fc2b RBP: ffff81007e2e28c0 R08: ffffffff8055b492 R09: ffff81007e39f510 R10: ffff81007e3a1e50 R11: 0000000000000098 R12: ffff81007e3a1e50 R13: 0000000000000000 R14: ffffffffff5fe000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffffffff807fc000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0 Process swapper (pid: 1, threadinfo ffff81007e3a0000, task ffff81007e39f510) Stack: 0000000000000000 ffffffff80847470 0000000000000000 0000000000000000 0000000000000000 ffffffff8081e187 00000000fed00000 ffffffffff5fe000 0000000300010001 0000000800000002 0000000000000000 0000000000000000 Call Trace: [<ffffffff8081e187>] late_hpet_init+0xa7/0xb2 [<ffffffff8020717f>] init+0x139/0x2fe [<ffffffff8020a5b4>] child_rip+0xa/0x12 DWARF2 unwinder stuck at child_rip+0xa/0x12 Leftover inexact backtrace: [<ffffffff803544b6>] acpi_ds_init_one_object+0x0/0x82 [<ffffffff80207046>] init+0x0/0x2fe [<ffffffff8020a5aa>] child_rip+0x0/0x12 Code: 48 f7 f6 83 7d 30 01 8b 75 34 48 89 45 20 49 8b 4c 24 08 48 RIP [<ffffffff80388299>] hpet_alloc+0x12a/0x30c RSP <ffff81007e3a1e20> <0>Kernel panic - not syncing: Attempted to kill init! NMI Watchdog detected LOCKUP on CPU 0 CPU 0 Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.18 #3 RIP: 0010:[<ffffffff8033fa93>] [<ffffffff8033fa93>] __delay+0x6/0x10 RSP: 0000:ffff81007e3a1b50 EFLAGS: 00000293 RAX: 00000000000480f3 RBX: 0000000000000000 RCX: 000000008dea8c6a RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000265e28 RBP: 00000000000009b0 R08: 0000000000000000 R09: ffff8100010503d4 R10: 0000000000000001 R11: ffffffff8034e288 R12: 0000000000000000 R13: 000000000000000b R14: ffffffff8055bc9f R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffffffff807fc000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0 Process swapper (pid: 1, threadinfo ffff81007e3a0000, task ffff81007e39f510) Stack: ffffffff80230a09 0000003000000008 ffff81007e3a1c48 ffff81007e3a1b78 0000000000000246 ffffffff8055bc9f 0000000000000246 ffff81007e39f510 0000000000000000 0000000000000000 ffff8100010503d4 0000000000000000 Call Trace: [<ffffffff80230a09>] panic+0x12c/0x12f [<ffffffff802338c5>] do_exit+0x85/0x87b [<ffffffff8020b0df>] kernel_math_error+0x0/0x90 Code: 0f 31 29 c8 48 39 f8 72 f5 c3 65 8b 04 25 2c 00 00 00 48 98 console shuts up ... <0>Kernel panic - not syncing: Attempted to kill init! Please look at HPET lines. HPET is mapped to 0xfed00000. Size of ivshmem is 32MB. During pci enumeration ivshmem will corrupt the range from 0xfe000000 - 0xffffffff. It overlaps HPET memory. When Linux does late_hpet init, it finds garbage and this is causing panic. Thanks, Alexey