------- Comment From mdr...@us.ibm.com 2016-10-04 18:34 EDT-------
Some observations:

1) QEMU appears to be sending the 'device-removed' event prematurely.
The below output shows that the device's VFIO group FD is still open by
the QEMU process at the time it signals libvirt that the device
unplug/cleanup has completed:

root@ltc-fire1:~# virsh event ltc-fire1-vm3-ubuntu-16.10 --event device-removed 
&& lsof /dev/vfio/7
event 'device-removed' for domain ltc-fire1-vm3-ubuntu-16.10: hostdev0
events received: 1

COMMAND     PID         USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
qemu-syst 31231 libvirt-qemu   26u   CHR  242,0      0t0  750 /dev/vfio/7

2) In response to this event, libvirt issues the following sequence to
rebind the VF:

echo $DEVID >/sys/bus/pci/drivers/vfio-pci/unbind
echo $DEVID >/sys/bus/pci/drivers_probe

3) On the VFIO side, this consistently leads to mlx5_core attempting to
bind to the device while VFIO is still running it's cleanup routines:

[  120.099498] KVM guest htab at c000000f2b000000 (order 26), LPID 1
[  120.208235] pci 0001:01: 0.2: [PE# 005] Setting up window#0 0..3fffffff 
pg=1000
[  138.281730] pci 0001:01: 0.2: [PE# 005] Setting up window#1 
800000000000000..8000001ffffffff pg=10000
[  396.873573] vfio-pci 0001:01:00.2: No device request channel registered, 
blocked until released by user
[  396.873791] pci 0001:01: 0.2: [PE# 005] Removing DMA window #0
[  396.873796] pci 0001:01: 0.2: [PE# 005] Removing DMA window #1
[  396.873908] mlx5_core 0001:01:00.2: enabling device (0000 -> 0002)
[  396.873940] mlx5_core 0001:01:00.2: Using 32-bit DMA via iommu
[  396.874034] mlx5_core 0001:01:00.2: firmware version: 12.17.1010

The full cleanup path should include something like:
[ 4762.425039] pci 0001:01: 0.2: [PE# 005] Removing DMA window #0
[ 4762.425043] pci 0001:01: 0.2: [PE# 005] Removing DMA window #1
[ 4762.432014] pci 0001:01: 0.2: [PE# 005] Setting up window#0 0..7fffffff 
pg=1000
[ 4762.432018] pci 0001:01: 0.2: [PE# 005] Enabling 64-bit DMA bypass

So the driver is attempting to enable the device before the default DMA
windows have been restored

4) The sleep Carol inserted above in VFIO cleanup path seems to avoid
the issue. This suggests that the reprobe doesn't blindly run but
instead waits for a signal of some sort, but that that signaling seems
to happen prematurely without the explicit sleep.

This probably needs to be addressed at multiple levels, a fix in QEMU to
defer the device-deleted event until VFIO has cleanup up the device, and
a fix in VFIO path to avoid crashing the host if someone were to issue
the reprobe manually while the device is still in use.

A possible workaround that might be worth trying in the meantime is
specifying managed='no' in the device XML, which according to libvirt
documentation would prevent libvirt from automatically rebinding the
device back to default in the host after unplug. But I saw mention that
maybe this wasn't supported yet for KVM, so it's not a given.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1630304

Title:
  Ubuntu 16.10 KVM: Issue doing hotplug detach to SRIOV VF

Status in linux package in Ubuntu:
  New

Bug description:
  ---Problem Description---
  I can not get hotplug attach to work in Ubuntu but if I try to detach a CX4 
VF from a guest I am getting some issues:
  Like in this case:
  [  474.393308] vfio-pci 0001:01:00.3: No device request channel registered, 
blocked until released by user
  [  474.393543] pci 0001:01: 0.3: [PE# 006] Removing DMA window #0
  [  474.393553] pci 0001:01: 0.3: [PE# 006] Removing DMA window #1
  [  474.393906] mlx5_core 0001:01:00.3: enabling device (0000 -> 0002)
  [  474.393939] mlx5_core 0001:01:00.3: Using 32-bit DMA via iommu
  [  474.400360] pci 0001:01: 0.3: [PE# 006] Setting up window#0 0..7fffffff 
pg=1000
  [  474.400380] mlx5_core 0001:01:00.3: firmware version: 12.17.226
  [  474.401341] pci 0001:01: 0.3: [PE# 006] Enabling 64-bit DMA bypass
  [  474.402284] EEH: Frozen PE#6 on PHB#1 detected
  [  474.402475] EEH: PE location: Slot4, PHB location: N/A
  [  474.403699] EEH: This PCI device has failed 1 times in the last hour
  [  474.403700] EEH: Notify device drivers to shutdown
  [  474.403707] mlx5_core 0001:01:00.3: mlx5_pci_err_detected was called
  [  474.403711] mlx5_core 0001:01:00.3: 
0001:01:00.3:mlx5_enter_error_state:115:(pid 779): start
  [  474.403870] mlx5_core 0001:01:00.3: 
0001:01:00.3:mlx5_enter_error_state:120:(pid 779): end

  
  One time I saw 
  SSep 13 09:41:32 ltc-fire1 kernel: [70437.943722] vfio-pci 0001:01:00.3: No 
device request channel registered, blocked until released by user
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.944076] mlx5_core 0001:01:00.3: 
enabling device (0000 -> 0002)
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.944110] mlx5_core 0001:01:00.3: 
Using 32-bit DMA via iommu
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.944145] pci 0001:01: 0.3: [PE# 006] 
Removing DMA window #0
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.944152] pci 0001:01: 0.3: [PE# 006] 
Removing DMA window #1
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.944195] mlx5_core 0001:01:00.3: 
firmware version: 12.17.226
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.944260] Unable to handle kernel 
paging request for data at address 0x00000000
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.944533] Faulting instruction 
address: 0xc0000000005b37e0
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.944592] Oops: Kernel access of bad 
area, sig: 11 [#1]
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.944636] SMP NR_CPUS=2048 NUMA PowerNV
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.944851] Modules linked in: vfio_pci 
irqbypass vfio_iommu_spapr_tce vfio_virqfd vfio vfio_spapr_eeh xt_CHECKSUM 
iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 
nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT 
nf_reject_ipv4 xt_tcpudp kvm_hv kvm_pr kvm ebtable_filter ebtables 
ip6table_filter ip6_tables iptable_filter rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) 
iw_cm(OE) configfs ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) 
mlx5_core(OE) mlx4_ib(OE) ib_sa(OE) ib_mad(OE) ib_core(OE) mlx4_en(OE) 
ib_addr(OE) ib_netlink(OE) mlx4_core(OE) mlx_compat(OE) bridge stp llc joydev 
input_leds mac_hid ofpart at24 cmdlinepart powernv_flash ipmi_powernv 
nvmem_core uio_pdrv_genirq opal_prd mtd ipmi_msghandler uio ibmpowernv 
powernv_rng binfmt_misc dm_multipath knem(OE) ip_tables x_tables autofs4 
hid_generic usbhid hid uas usb_storage ast i2c_algo_bit ttm drm_kms_helper 
syscopyarea sysfillrect sys
 imgblt fb_sys_fops drm ahci devlink libahci [last unloaded: mlx4_core]
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.946007] CPU: 40 PID: 12501 Comm: 
libvirtd Tainted: G           OE   4.7.0unofficial #5
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.946074] task: c000000ec319a200 ti: 
c000000ec324c000 task.ti: c000000ec324c000
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.946140] NIP: c0000000005b37e0 LR: 
c0000000005ad070 CTR: 0000000000000000
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.946208] REGS: c000000ec324f100 TRAP: 
0300   Tainted: G           OE    (4.7.0unofficial)
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.946286] MSR: 900000010280b033 
<SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 84028844  XER: 20000000
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.946533] CFAR: c000000000008468 DAR: 
0000000000000000 DSISR: 40000000 SOFTE: 0
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.946533] GPR00: c0000000005d19c8 
c000000ec324f380 c0000000013bef00 0000000000000000
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.946533] GPR04: ffffffffffffffff 
0000000000000000 ffffffffffffffff 000000000000ffff
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.946533] GPR08: 0000000000000000 
0000000000000000 ffffffffffffffff f000000003803140
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.946533] GPR12: 0000000024048840 
c00000000fb96800 c000000ec0def080 c000000ec0def000
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.946533] GPR16: 00003fff93ef0000 
0000000000000001 0000000000000000 c000000ec0def100
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.946533] GPR20: c000000ec0def118 
0000000000000000 0000000000010000 0000000000000000
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.946533] GPR24: 0000000000010000 
ffffffffffff0000 0000000000000000 000000000000ffff
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.946533] GPR28: 0000000000000000 
ffffffffffffffff 0000000000010000 0000000000000000
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.947437] NIP [c0000000005b37e0] 
_find_next_bit.part.0+0x10/0x90
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.947496] LR [c0000000005ad070] 
bitmap_find_next_zero_area_off+0x70/0x110
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.947553] Call Trace:
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.947578] [c000000ec324f380] 
[c000000ec324f3c0] 0xc000000ec324f3c0 (unreliable)
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.947659] [c000000ec324f3e0] 
[c0000000005d19c8] iommu_area_alloc+0x78/0x130
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.947731] [c000000ec324f440] 
[c000000000041a0c] iommu_range_alloc+0x18c/0x410
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.947812] [c000000ec324f500] 
[c000000000041f24] iommu_alloc+0x74/0x190
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.947885] [c000000ec324f570] 
[c0000000000437b8] iommu_alloc_coherent+0x138/0x1e0
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.947968] [c000000ec324f620] 
[c000000000041538] dma_iommu_alloc_coherent+0x58/0x80
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.948057] [c000000ec324f670] 
[d000000012e75eb0] mlx5_cmd_init+0x118/0x940 [mlx5_core]
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.948146] [c000000ec324f740] 
[d000000012e722f0] mlx5_load_one+0x198/0x16c0 [mlx5_core]
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.948235] [c000000ec324f8b0] 
[d000000012e73e5c] init_one+0x3b4/0x5e0 [mlx5_core]
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.948314] [c000000ec324f950] 
[c0000000006042dc] local_pci_probe+0x6c/0x140
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.948381] [c000000ec324f9e0] 
[c0000000006051b8] pci_device_probe+0x178/0x200
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.948462] [c000000ec324fa40] 
[c0000000006f6cb0] driver_probe_device+0x220/0x580
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.948542] [c000000ec324fad0] 
[c0000000006f348c] bus_for_each_drv+0x8c/0x100
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.948611] [c000000ec324fb20] 
[c0000000006f67b0] __device_attach+0x140/0x220
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.948679] [c000000ec324fbb0] 
[c0000000006f4510] bus_rescan_devices_helper+0x70/0xc0

  
  I think the key is that I am seeing this trace with 32bit dma which is maybe 
causing issue
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.943722] vfio-pci 0001:01:00.3: No 
device request channel registered, blocked until released by user
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.944076] mlx5_core 0001:01:00.3: 
enabling device (0000 -> 0002)
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.944110] mlx5_core 0001:01:00.3: 
Using 32-bit DMA via iommu
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.944145] pci 0001:01: 0.3: [PE# 006] 
Removing DMA window #0
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.944152] pci 0001:01: 0.3: [PE# 006] 
Removing DMA window #1
  Sep 13 09:41:32 ltc-fire1 kernel: [70437.944195] mlx5_core 0001:01:00.3: 
firmware version: 12.17.226

   
  ---uname output---
  Linux ltc-fire1 4.7.0unofficial #5 SMP Mon Sep 5 08:53:38 EDT 2016 ppc64le 
ppc64le ppc64le GNU/Linux
   
  Machine Type = P8 
   
  ---Debugger---
  A debugger is not configured
   
  ---Steps to Reproduce---
   Im using a kernel 4.7 plus the qemu is from Mike Roth qemu with DDW. 
  To recreate just need to bring a guest with VF and then I did 
  virsh detach-device ltc-fire1-vm2-ubuntu-16.10 ./add_cx4vf.xml --live

  
  This should be ship issue given that we now have whole sriov hotplug broken 
on Ubuntu 16.10

  > Hi Carol,
  > 
  > Just to confirm, have you hit this with the new QEMU package from Jon's PPA?
  > It should be based on QEMU 2.6.1, which should have some fixes related to 
EEH

  I have not tried. Will do during the day.

  As far as the unplug issues, I'm unable to reproduce any of the host
  errors noted in the initial comments using the latest 2.6.1-based
  package. I'm only able to reproduce the following guest errors in
  dmesg. Carol, can you confirm whether or not this is also the case for
  you?

  Here's what I'm seeing, but it's intermittent. Sometimes I see up to 7
  successful unplugs before reproducing, but once I do reproduce from
  what I can tell the trace is always this:

  [   63.314271] Unable to handle kernel paging request for data at address 
0x00000000
  [   63.327986] Faulting instruction address: 0xd000000000ea5cbc
  [   63.328388] Oops: Kernel access of bad area, sig: 11 [#1]
  [   63.328741] SMP NR_CPUS=2048 NUMA pSeries
  [   63.329085] Modules linked in: rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) 
iw_cm(OE) configfs ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) ib_ipoib(OE) ib_cm(OE) 
mlx4_ib(OE) ib_sa(OE) ib_mad(OE) ib_core(OE) ib_addr(OE) ib_netlink(OE) 
mlx4_en(OE) mlx4_core(OE) sg ghash_generic gf128mul vmx_crypto ip_tables 
x_tables autofs4 ext4 crc16 jbd2 fscrypto mbcache sr_mod cdrom virtio_net 
virtio_blk ibmvscsi scsi_transport_srp mlx5_core(OE) crc32c_vpmsum ptp pps_core 
mlx_compat(OE) virtio_pci virtio_ring virtio 
  [   63.332759] CPU: 14 PID: 5234 Comm: drmgr Tainted: G        W  OE   
4.8.0-15-generic #16-Ubuntu
  [   63.332911] task: c000000003bb9780 task.stack: c000000003cd8000
  [   63.333025] NIP: d000000000ea5cbc LR: d000000000ea5cb4 CTR: 
c0000000002ddc10
  [   63.333151] REGS: c000000003cdb6d0 TRAP: 0300   Tainted: G        W  OE    
(4.8.0-15-generic)
  [   63.333305] MSR: 800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  
CR: 22000228  XER: 20000000
  [   63.333716] CFAR: c000000000142f3c DAR: 0000000000000000 DSISR: 40000000 
SOFTE: 1 
  GPR00: d000000000ea5cb4 c000000003cdb950 d000000000f288e8 0000000000000000  
  GPR04: c0000000b3716b40 d000000000ea5b48 c0000000010f6a38 c0000000010d7b00  
  GPR08: c000000000fa7b00 0000000000000000 0000000000000000 d000000000eeff70 
  GPR12: c0000000002ddc10 c00000000fb87e00 0000000000000000 0000000000000000  
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000  
  GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000  
  GPR24: c000000004ff5b58 c000000004ff5b20 0000000000000001 c000000004fc04e8  
  GPR28: c000000004fc0000 0000000000000000 c000000004fc0290 0000000000000000  
  [   63.335484] NIP [d000000000ea5cbc] destroy_msg_cache+0x84/0x130 [mlx5_core]
  [   63.335609] LR [d000000000ea5cb4] destroy_msg_cache+0x7c/0x130 [mlx5_core]
  [   63.335713] Call Trace:
  [   63.335805] [c000000003cdb950] [d000000000ea5cb4] 
destroy_msg_cache+0x7c/0x130 [mlx5_core] (unreliable)
  [   63.336003] [c000000003cdb9a0] [d000000000ea66e8] 
mlx5_cmd_cleanup+0x40/0x80 [mlx5_core]
  [   63.336206] [c000000003cdb9d0] [d000000000ea1a70] 
mlx5_unload_one+0x248/0x420 [mlx5_core]
  [   63.336383] [c000000003cdba90] [d000000000ea2028] remove_one+0xb0/0x170 
[mlx5_core]
  [   63.336539] [c000000003cdbb20] [c0000000005660e0] 
pci_device_remove+0x70/0x110
  [   63.336688] [c000000003cdbb60] [c000000000610e80] 
__device_release_driver+0xc0/0x1a0
  [   63.336828] [c000000003cdbb90] [c000000000610fa0] 
device_release_driver+0x40/0x70
  [   63.337015] [c000000003cdbbc0] [c00000000055a730] 
pci_stop_bus_device+0xe0/0x100
  [   63.337156] [c000000003cdbc00] [c00000000055a914] 
pci_stop_and_remove_bus_device_locked+0x34/0x60
  [   63.337321] [c000000003cdbc30] [c0000000005692b0] remove_store+0xb0/0xd0
  [   63.337458] [c000000003cdbc80] [c000000000608ef8] dev_attr_store+0x68/0xa0
  [   63.337571] [c000000003cdbcc0] [c0000000003cb3f0] sysfs_kf_write+0x80/0xb0
  [   63.337692] [c000000003cdbd00] [c0000000003ca25c] 
kernfs_fop_write+0x17c/0x250
  [   63.337825] [c000000003cdbd50] [c00000000030fc3c] __vfs_write+0x6c/0xe0
  [   63.337950] [c000000003cdbd90] [c000000000311144] vfs_write+0xd4/0x240
  [   63.338094] [c000000003cdbde0] [c000000000312e5c] SyS_write+0x6c/0x110
  [   63.338233] [c000000003cdbe30] [c0000000000095e0] system_call+0x38/0x108
  [   63.338364] Instruction dump:
  [   63.338429] 60420000 480498b9 e8410018 813e0060 7fa4eb78 7f83e378 7ffdfb78 
3929ffff                                                                        
             
  [   63.338697] 913e0060 4bfffe21 7fbff040 7fe3fb78 <ebff0000> 409effd0 
3bde0078 7fbed840
  [   63.338925] ---[ end trace a4fbfe789eac3ffc ]---
  [   63.344490]

  
  I'm using QEMU directly, and see the above after the device_del. I see no 
other errors in guest/host:

  root@ltc-fire1:~/mdroth# ./vfio-bind 0001:01:00.2
  unbinding 0001:01:00.2 via /sys/bus/pci/devices/0001:01:00.2/driver/unbind
  binding 0001:01:00.2
  echo 0x15b3 0x1014 >/sys/bus/pci/drivers/vfio-pci/new_id
  root@ltc-fire1:~/mdroth# ./vfio-bind 0001:01:00.2

  qemu-system-ppc64 --smp sockets=1,cores=2,threads=8 --enable-kvm -M
  pseries -m 3G,slots=32,maxmem=8G -device virtio-blk-
  pci,id=disk1,drive=drive1 -drive
  file=ubuntu16.10.base.img.4,aio=threads,cache=none,if=none,id=drive1
  -device spapr-vscsi,id=scsi0,reg=0x2000 -monitor unix:/tmp/mdroth-
  vm4-hmp.sock,server,nowait -qmp unix:/tmp/mdroth-
  vm4-qmp.sock,server,nowait -device virtio-net-
  pci,mac=52:54:00:12:34:01,netdev=net1,id=nic1 -netdev
  user,hostfwd=tcp:0.0.0.0:2224-:22,id=net1 -device vfio-
  pci,host=0001:01:00.2,id=vfio0 -L /root/mdroth-qemu/pc-bios -boot c
  -vga none -nographic

  root@ltc-fire1:~# rlwrap socat stdio unix-connect:/tmp/mdroth-vm4-hmp.sock 
  QEMU 2.6.1 monitor - type 'help' for more information
  (qemu) device_del vfio0

  I'm planning to do more debugging of the guest driver's
  mlx5_cmd_cleanup -> destroy_msg_cache cleanup path which seems to be
  triggering the error, but want to confirm I'm looking at the same
  issue that Carol is currently tracking.

  > Sometimes I see that one, but if I do not see that one then I see issues at
  > host which could be the stack trace that I put or an EEH. But the problems
  > at host the EEH or the stack trace I always see that the enable device is
  > happening before the 64bit dma is set back again to the card.

  You mean the enable device for the card in the guest after hotplug? or
  is this in the unplug path?

  Here's what I see when doing unplug:

  <bind 0001:01: 0.2 to vfio-pci>
  [ 8056.733150] pci 0001:01: 0.2: [PE# 005] Disabling 64-bit DMA bypass
  [ 8056.733157] pci 0001:01: 0.2: [PE# 005] Removing DMA window #0
  [ 8056.733927] pci 0001:01: 0.2: [PE# 005] Setting up window#0 0..7fffffff 
pg=1000
  [ 8057.039487] pci 0001:01: 0.2: [PE# 005] Removing DMA window #0
  [ 8057.073152] vfio-pci 0001:01:00.2: enabling device (0000 -> 0002)

  <start guest>
  [ 7805.107856] KVM guest htab at c000000f2b000000 (order 26), LPID 1
  <guest immediately sets up default 32-bit window>
  [ 7805.216462] pci 0001:01: 0.2: [PE# 005] Setting up window#0 0..3fffffff 
pg=1000
  <guest kernel boots, enables 64-bit window>
  [ 7815.386476] pci 0001:01: 0.2: [PE# 005] Setting up window#1 
800000000000000..8000001ffffffff pg=10000
  <unplug 0001:01: 0.2 (if guest kernel doesn't crash, if it crashes we never 
get past here and hang in guest>
  [ 7906.714683] pci 0001:01: 0.2: [PE# 005] Removing DMA window #0
  [ 7906.714690] pci 0001:01: 0.2: [PE# 005] Removing DMA window #1
  <host re-enables 32-bit window and re-enables 64-bit DMA bypass>
  [ 7906.714722] window_size 0x80000000
  [ 7906.715205] Calling iommu_init_table
  [ 7906.721714] Calling pnv_pci_ioda2_set_window
  [ 7906.721851] pci 0001:01: 0.2: [PE# 005] Setting up window#0 0..7fffffff 
pg=1000
  [ 7906.722334] pci 0001:01: 0.2: [PE# 005] Enabling 64-bit DMA bypass

  I'm unable to reproduce any host crashes in either case. If you're
  hitting an issue where maybe we're rebinding the vf back to mlx5_core
  in the host before the window re-init is complete then maybe that
  might explain what you're seeing. If that's the case it might be a
  race somewhere in libvirt when it "un-stubs"/re-assigns the device
  back to default mlx5_core driver. I recall some previous libvirt BZs
  around avoiding this sort of scenario but i'm not sure if they are
  still pending or not.

  I'll do some more digging to see if I can reproduce without libvirt
  later today (perhaps by attempting to re-bind to mlx5_core immediately
  after unplug...in my scenario i think no such re-binding occurs so
  maybe that's why i don't see the host crashes (or other guest
  crashes...though it's also strange to me even in this race scenario
  that the guest would crash do to host race since drmgr fixes are
  already in place to make sure guest kernel has released the device
  before unplug completes in host/libvirt. maybe i just haven't seen the
  guest crashes yet and those are also a separate issue from the host
  ones))

  I was consulting with Mike about this and the last experiment that wanted to 
try was to put a sleep in vfio so in that way qemu can do all the dma setup 
before libvirt does the enable device at the KVM host. 
  I put a sleep in  drivers/vfio/vfio.c in function vfio_del_group_dev and that 
kind of helps in my case. Not sure if that will be the solution but it works 
for my case. 
  diff -Naurp vfio.c.save vfio.c
  --- vfio.c.save 2016-09-28 10:11:30.047333880 -0500
  +++ vfio.c      2016-09-30 13:57:15.376564520 -0500
  @@ -34,6 +34,7 @@
   #include <linux/uaccess.h>
   #include <linux/vfio.h>
   #include <linux/wait.h>
  +#include <linux/delay.h>

   #define DRIVER_VERSION "0.3"
   #define DRIVER_AUTHOR  "Alex Williamson <alex.william...@redhat.com>"
  @@ -910,6 +911,7 @@ void *vfio_del_group_dev(struct device *

          vfio_device_put(device);

  +       msleep(2000);
          /*
           * If the device is still present in the group after the above
           * 'put', then it is in use and we need to request it from the

  
  Then in dmesg I see
  [ 1122.568409] pci 0001:01: 0.3: [PE# 006] Removing DMA window #0
  [ 1122.568415] pci 0001:01: 0.3: [PE# 006] Removing DMA window #1
  [ 1122.575117] pci 0001:01: 0.3: [PE# 006] Setting up window#0 0..7fffffff 
pg=1000
  [ 1122.575121] pci 0001:01: 0.3: [PE# 006] Enabling 64-bit DMA bypass
  [ 1124.584345] mlx5_core 0001:01:00.3: enabling device (0000 -> 0002)
  [ 1124.584379] mlx5_core 0001:01:00.3: Using 64-bit DMA iommu bypass
  [ 1124.584445] mlx5_core 0001:01:00.3: firmware version: 12.17.1010
  [ 1125.062922] mlx5_core 0001:01:00.3: Assigned random MAC address 
f2:c8:5a:8b:2a:b2
  [ 1125.108710] mlx5_core 0001:01:00.3 enP1p1s0f3: renamed from eth0

  Mirroring bug to Launchpad for Canonical's awareness...

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1630304/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to