On Sun, May 09, 2021 at 01:50:58PM +0000, Dave Voutila wrote:
> 
> Mike Larkin writes:
> 
> > On Sat, May 08, 2021 at 08:14:35AM -0400, Dave Voutila wrote:
> >>
> >> Josh Rickmar writes:
> >>
> >> > On Fri, May 07, 2021 at 04:19:18PM -0400, Dave Voutila wrote:
> >> >>
> >> >> Josh Rickmar writes:
> >> >>
> >> >> >>Synopsis:     vmm protection fault trap
> >> >> >>Category:     vmm
> >> >> >>Environment:
> >> >> >       System      : OpenBSD 6.9
> >> >> >       Details     : OpenBSD 6.9-current (GENERIC.MP) #6: Thu May  6 
> >> >> > 10:16:53 MDT 2021
> >> >> >                        
> >> >> > [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> >> >> >
> >> >> >       Architecture: OpenBSD.amd64
> >> >> >       Machine     : amd64
> >> >> >>Description:
> >> >> >
> >> >> > My nixos vm is causing the host kernel to crash (after cold boot) with
> >> >> > 'protection fault trap, code=0'.  The guest is running Linux 5.11.14
> >> >> > (guest dmesg included after the host dmesg below).  I've also attached
> >> >> > a screenshot of ddb showing the backtrace and registers.
> >> >> >
> >> >> >>How-To-Repeat:
> >> >> >
> >> >> > The crash can be reliably triggered by doing heavy disk IO on the vm.
> >> >> > Upgrading the VM actually got the nixos install wedged during an
> >> >> > initial crash, and attempting to repair it with "nix-build -A system
> >> >> > '<nixpkgs/nixos>' --repair" is reliably repeating the crash.
> >> >>
> >> >> Any chance you've experienced this with a non-NixOS guest? I can't
> >> >> reproduce this error on my Ryzen5 Pro host.
> >> >>
> >>
> >> I've reproduced this locally with the help of abieber@. Seems I just
> >> need to boot a nixos iso (nixos-21.05pre287333.63586475587-x86_64) and
> >> try installing a package like git into the ramdisk:
> >>
> >>   # nix-env -f '<nixpkgs>' -iA git
> >>
> >> I still haven't triggered this without nixos, but at least I can
> >> reproduce it locally now. :-)
> >>
> >> -dv
> >>
> >
> > robert@ reported this same bug a long time ago and I could never reproduce 
> > it.
> >
> > I'll see if it repros against my R415 using these instructions.
> >
> > -ml
> 
> So far I haven't managed to trigger it using this diff. I don't know
> why, but maybe the guest is mucking with the GDTR? I checked our logic
> vs. netbsd nvmm's...as well as our acpi resume handling...and that's all
> I can think of to explain it.
> 
> 
> Index: sys/arch/amd64/amd64/vmm_support.S
> ===================================================================
> RCS file: /cvs/src/sys/arch/amd64/amd64/vmm_support.S,v
> retrieving revision 1.17
> diff -u -p -r1.17 vmm_support.S
> --- sys/arch/amd64/amd64/vmm_support.S        13 Feb 2021 07:47:37 -0000      
> 1.17
> +++ sys/arch/amd64/amd64/vmm_support.S        9 May 2021 13:45:08 -0000
> @@ -747,6 +747,7 @@ restore_host_svm:
>       popw    %ax             /* ax = saved TR */
> 
>       popq    %rdx
> +     lgdtq   (%rdx)
>       addq    $0x2, %rdx
>       movq    (%rdx), %rdx

I was able to repair my nix store with this diff (twice, first time on
a derived qcow2 image for testing).

Reply via email to