Mike Larkin writes:
> On Sat, May 08, 2021 at 08:14:35AM -0400, Dave Voutila wrote: >> >> Josh Rickmar writes: >> >> > On Fri, May 07, 2021 at 04:19:18PM -0400, Dave Voutila wrote: >> >> >> >> Josh Rickmar writes: >> >> >> >> >>Synopsis: vmm protection fault trap >> >> >>Category: vmm >> >> >>Environment: >> >> > System : OpenBSD 6.9 >> >> > Details : OpenBSD 6.9-current (GENERIC.MP) #6: Thu May 6 >> >> > 10:16:53 MDT 2021 >> >> > >> >> > [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP >> >> > >> >> > Architecture: OpenBSD.amd64 >> >> > Machine : amd64 >> >> >>Description: >> >> > >> >> > My nixos vm is causing the host kernel to crash (after cold boot) with >> >> > 'protection fault trap, code=0'. The guest is running Linux 5.11.14 >> >> > (guest dmesg included after the host dmesg below). I've also attached >> >> > a screenshot of ddb showing the backtrace and registers. >> >> > >> >> >>How-To-Repeat: >> >> > >> >> > The crash can be reliably triggered by doing heavy disk IO on the vm. >> >> > Upgrading the VM actually got the nixos install wedged during an >> >> > initial crash, and attempting to repair it with "nix-build -A system >> >> > '<nixpkgs/nixos>' --repair" is reliably repeating the crash. >> >> >> >> Any chance you've experienced this with a non-NixOS guest? I can't >> >> reproduce this error on my Ryzen5 Pro host. >> >> >> >> I've reproduced this locally with the help of abieber@. Seems I just >> need to boot a nixos iso (nixos-21.05pre287333.63586475587-x86_64) and >> try installing a package like git into the ramdisk: >> >> # nix-env -f '<nixpkgs>' -iA git >> >> I still haven't triggered this without nixos, but at least I can >> reproduce it locally now. :-) >> >> -dv >> > > robert@ reported this same bug a long time ago and I could never reproduce it. > > I'll see if it repros against my R415 using these instructions. > > -ml So far I haven't managed to trigger it using this diff. I don't know why, but maybe the guest is mucking with the GDTR? I checked our logic vs. netbsd nvmm's...as well as our acpi resume handling...and that's all I can think of to explain it. Index: sys/arch/amd64/amd64/vmm_support.S =================================================================== RCS file: /cvs/src/sys/arch/amd64/amd64/vmm_support.S,v retrieving revision 1.17 diff -u -p -r1.17 vmm_support.S --- sys/arch/amd64/amd64/vmm_support.S 13 Feb 2021 07:47:37 -0000 1.17 +++ sys/arch/amd64/amd64/vmm_support.S 9 May 2021 13:45:08 -0000 @@ -747,6 +747,7 @@ restore_host_svm: popw %ax /* ax = saved TR */ popq %rdx + lgdtq (%rdx) addq $0x2, %rdx movq (%rdx), %rdx
