Hi folks, this is my first post to the group. Apologies for length.
I've been experimenting with GPU passthrough on bhyve. For background, the host
system is FreeBSD 12.0-RELEASE on an AMD Ryzen 1700 CPU @ 3.8 GHz, 32 GB of ECC
RAM, with two nVidia GPUs. I'm working with a Linux Debian 9 guest and a
Windows Server 2019 (desktop experience installed) guest. I also have a USB
controller passed-through for bluetooth and keyboard.
With some unpleasant hacks I have succeeded in starting X on the Linux guest,
passing-through an nVidia GT 710 under the nouveau driver. I can run the "mate"
desktop and glxgears, both of which are smooth at 4K. The Unity Heaven
benchmark runs at an embarrassing 0.1 fps, and 2160p x264 video in VLC runs at
about 5 fps. Neither appears to be CPU-bound in the host or the guest.
The hack I had to make: I found that many instructions to access memory-mapped
PCI BARs are not being executed on the CPU in guest mode but are being passed
back for emulation in the hypervisor. This causes an assertion to fail inside
passthru_write() in pci_passthru.c ["pi->pi_bar[baridx].type == PCIBAR_IO"]
because it does not expect to perform memory-mapped IO for the guest. Examining
the to-be-emulated instructions in vmexit_inst_emul() {e.g., movl (%rdi),
%eax}, they look benign to me, and I have no explanation for why the CPU
refused to execute them in guest mode.
As an amateur work-around, I removed the assertion and instead I obtain the
desired offset into the guest's BAR, calculate what that guest address
translates to in the host's address space, open(2) /dev/mem, mmap(2) over to
that address, and perform the write directly. I do a similar trick in
passthru_read(). Ugly, slow, but functional.
This code path is accessed continuously whether or not X is running, with an
increase in activity when running anything GPU-heavy. Always to bar 1, and
mostly around the same offsets. I added some logging of this event. It runs at
about 100 lines per second while playing video. An excerpt is:
...
Unexpected out-of-vm passthrough write #492036 to bar 1 at offset 41100.
Unexpected out-of-vm passthrough write #492037 to bar 1 at offset 41100.
Unexpected out-of-vm passthrough read #276162 to bar 1 at offset 561280.
Unexpected out-of-vm passthrough write #492038 to bar 1 at offset 38028.
Unexpected out-of-vm passthrough write #492039 to bar 1 at offset 38028.
Unexpected out-of-vm passthrough read #276163 to bar 1 at offset 561184.
Unexpected out-of-vm passthrough read #276164 to bar 1 at offset 561184.
Unexpected out-of-vm passthrough read #276165 to bar 1 at offset 561184.
Unexpected out-of-vm passthrough read #276166 to bar 1 at offset 561184.
...
So my question here is,
1. How do I diagnose why the instructions are not being executed in guest mode?
Some other problems:
2. Once the virtual machine is shut down, the passed-through GPU doesn't get
turned off. Whatever message was on the screen in the final throes of Linux's
shutdown stays there. Maybe there is a specific detach command which bhyve or
nouveau hasn't yet implemented? Alternatively, maybe I could exploit some power
management feature to reset the card when bhyve exits.
3. It is not possible to reboot the guest and then start X again without an
intervening host reboot. The text console works fine. Xorg.0.log has a message
like
(EE) [drm] Failed to open DRM device for pci:0000:00:06.0: -19
(EE) open /dev/dri/card0: No such file or directory
dmesg is not very helpful either.[0] I suspect that this is related to problem
(2).
4. There is a known bug in the version of the Xorg server that ships with
Debian 9, where the switch from an animated mouse cursor back to a static
cursor causes the X server to sit in a busy loop of gradually increasing stack
depth, if the GPU takes too long to communicate with the driver.[1] For me,
this consistently happens after I type my password into the Debian login dialog
box and eventually (~ 120 minutes) locks up the host by eating all the swap. A
work-around is to replace the guest's animated cursors with static cursors. The
bug is fixed in newer versions of X, but I haven't tested whether their fix
works for me yet.
5. The GPU doesn't come to life until the nouveau driver kicks in. What is
special about the driver? Why doesn't the UEFI open the GPU and send it output
before the boot? Any idea if the problem is on the UEFI side or the hypervisor
side?
6. On Windows, the way Windows probes multi-BAR devices seems to be
inconsistent with bhyve's model for storing io memory mappings. Specifically, I
believe Windows assigns the 0xffffffff sentinel to all BARs on a device in one
shot, then reads them back and assigns the true addresses afterwards. However,
bhyve sees the multiple 0xffffffff assignments to different BARs as a clash and
errors out on the second BAR probe. I removed most of the mmio_rb_tree error
handling in mem.c and this is sufficient for Windows to boot, and detect and
correctly identify the GPU. (A better solution might be to handle the initial
0xffffffff write as a special case.) I can then install the official nVidia
drivers without problem over Remote Desktop. However, the GPU never springs
into life: I am stuck with a "Windows has stopped this device because it has
reported problems. (Code 43)" error in the device manager, a blank screen, and
not much else to go on.
Is it worth me continuing to hack away at these problems---of course I'm happy
to share anything I come up with---or is there an official solution to GPU
support in the pipe about to make my efforts redundant :)?
Thanks,
Robert Crowston.
---
Footnotes
[0] Diff'ing dmesg after successful GPU initialization (+) and after failure
(-), and cutting out some lines that aren't relevant:
nouveau 0000:00:06.0: bios: version 80.28.a6.00.10
+nouveau 0000:00:06.0: priv: HUB0: 085014 ffffffff (1f70820b)
nouveau 0000:00:06.0: fb: 1024 MiB DDR3
@@ -466,24 +467,17 @@
nouveau 0000:00:06.0: DRM: DCB conn 00: 00001031
nouveau 0000:00:06.0: DRM: DCB conn 01: 00002161
nouveau 0000:00:06.0: DRM: DCB conn 02: 00000200
-nouveau 0000:00:06.0: disp: chid 0 mthd 0000 data 00000400 00001000 00000002
-nouveau 0000:00:06.0: timeout at
/build/linux-UEAD6s/linux-4.9.144/drivers/gpu/drm/nouveau/nvkm/engine/disp/dmacgf119.c:88/gf119_disp_dmac_init()!
-nouveau 0000:00:06.0: disp: ch 1 init: c207009b
-nouveau: DRM:00000000:0000927c: init failed with -16
-nouveau 0000:00:06.0: timeout at
/build/linux-UEAD6s/linux-4.9.144/drivers/gpu/drm/nouveau/nvkm/engine/disp/dmacgf119.c:54/gf119_disp_dmac_fini()!
-nouveau 0000:00:06.0: disp: ch 1 fini: c2071088
-nouveau 0000:00:06.0: timeout at
/build/linux-UEAD6s/linux-4.9.144/drivers/gpu/drm/nouveau/nvkm/engine/disp/dmacgf119.c:54/gf119_disp_dmac_fini()!
-nouveau 0000:00:06.0: disp: ch 1 fini: c2071088
+[drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
+[drm] Driver supports precise vblank timestamp query.
+nouveau 0000:00:06.0: DRM: MM: using COPY for buffer copies
+nouveau 0000:00:06.0: DRM: allocated 1920x1080 fb: 0x60000, bo ffff96fdb39a1800
+fbcon: nouveaufb (fb0) is primary device
-nouveau 0000:00:06.0: timeout at
/build/linux-UEAD6s/linux-4.9.144/drivers/gpu/drm/nouveau/nvkm/engine/disp/coregf119.c:187/gf119_disp_core_fini()
-nouveau 0000:00:06.0: disp: core fini: 8d0f0088
-[TTM] Finalizing pool allocator
-[TTM] Finalizing DMA pool allocator
-[TTM] Zone kernel: Used memory at exit: 0 kiB
-[TTM] Zone dma32: Used memory at exit: 0 kiB
-nouveau: probe of 0000:00:06.0 failed with error -16
+Console: switching to colour frame buffer device 240x67
+nouveau 0000:00:06.0: fb0: nouveaufb frame buffer device
+[drm] Initialized nouveau 1.3.1 20120801 for 0000:00:06.0 on minor 0
[1]
https://devtalk.nvidia.com/default/topic/1028172/linux/titan-v-ubuntu-16-04lts-and-387-34-driver-crashes-badly/post/5230898/#5230898
_______________________________________________
[email protected] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to
"[email protected]"