On Sun, May 09, 2021 at 01:50:58PM +0000, Dave Voutila wrote: > > Mike Larkin writes: > > > On Sat, May 08, 2021 at 08:14:35AM -0400, Dave Voutila wrote: > >> > >> Josh Rickmar writes: > >> > >> > On Fri, May 07, 2021 at 04:19:18PM -0400, Dave Voutila wrote: > >> >> > >> >> Josh Rickmar writes: > >> >> > >> >> >>Synopsis: vmm protection fault trap > >> >> >>Category: vmm > >> >> >>Environment: > >> >> > System : OpenBSD 6.9 > >> >> > Details : OpenBSD 6.9-current (GENERIC.MP) #6: Thu May 6 > >> >> > 10:16:53 MDT 2021 > >> >> > > >> >> > [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP > >> >> > > >> >> > Architecture: OpenBSD.amd64 > >> >> > Machine : amd64 > >> >> >>Description: > >> >> > > >> >> > My nixos vm is causing the host kernel to crash (after cold boot) with > >> >> > 'protection fault trap, code=0'. The guest is running Linux 5.11.14 > >> >> > (guest dmesg included after the host dmesg below). I've also attached > >> >> > a screenshot of ddb showing the backtrace and registers. > >> >> > > >> >> >>How-To-Repeat: > >> >> > > >> >> > The crash can be reliably triggered by doing heavy disk IO on the vm. > >> >> > Upgrading the VM actually got the nixos install wedged during an > >> >> > initial crash, and attempting to repair it with "nix-build -A system > >> >> > '<nixpkgs/nixos>' --repair" is reliably repeating the crash. > >> >> > >> >> Any chance you've experienced this with a non-NixOS guest? I can't > >> >> reproduce this error on my Ryzen5 Pro host. > >> >> > >> > >> I've reproduced this locally with the help of abieber@. Seems I just > >> need to boot a nixos iso (nixos-21.05pre287333.63586475587-x86_64) and > >> try installing a package like git into the ramdisk: > >> > >> # nix-env -f '<nixpkgs>' -iA git > >> > >> I still haven't triggered this without nixos, but at least I can > >> reproduce it locally now. :-) > >> > >> -dv > >> > > > > robert@ reported this same bug a long time ago and I could never reproduce > > it. > > > > I'll see if it repros against my R415 using these instructions. > > > > -ml > > So far I haven't managed to trigger it using this diff. I don't know > why, but maybe the guest is mucking with the GDTR? I checked our logic > vs. netbsd nvmm's...as well as our acpi resume handling...and that's all > I can think of to explain it. > > > Index: sys/arch/amd64/amd64/vmm_support.S > =================================================================== > RCS file: /cvs/src/sys/arch/amd64/amd64/vmm_support.S,v > retrieving revision 1.17 > diff -u -p -r1.17 vmm_support.S > --- sys/arch/amd64/amd64/vmm_support.S 13 Feb 2021 07:47:37 -0000 > 1.17 > +++ sys/arch/amd64/amd64/vmm_support.S 9 May 2021 13:45:08 -0000 > @@ -747,6 +747,7 @@ restore_host_svm: > popw %ax /* ax = saved TR */ > > popq %rdx > + lgdtq (%rdx) > addq $0x2, %rdx > movq (%rdx), %rdx
I was able to repair my nix store with this diff (twice, first time on a derived qcow2 image for testing).
