On 11/22/16 14:58, Evgeny Yakovlev wrote:
> Wow, that is more than i expected :)
> 
>> I wonder if you started to see this issue very recently.
> Very recently, however we use a pretty old OVMF build, circa 2015

Ugh. Please update OVMF first... A whole lot of things has changed in
edk2 in this year.

> 
>>  OVMF debug log
> Sorry, we hadn't had it enabled when VM crashed and these crashes are very
> rare. We will try to capture it when it happens again
> 
>> - your host CPU model,
> cpu family      : 6
> model           : 42
> model name      : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
> stepping        : 7
> 
>> - the host kernel (KVM) version,
> Our kernel is roughly based on RHEL7.2 (kernel version 3.10.0-327.36.1). We
> also have some upstream KVM patches backported.
> 
>> - the guest CPU model,
> -cpu
> SandyBridge,+vme,+ds,+acpi,+ss,+ht,+tm,+pbe,+dtes64,+monitor,+ds_cpl,+smx,+est,+tm2,+xtpr,+pdcm,+pcid,+osxsave,-arat,-xsaveopt,-xgetbv1,-vmx,-xsavec,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,hv_vpindex,hv_runtime,hv_synic,hv_stimer,hv_reset,hv_crash
> 
>> - the guest CPU topology.
> 8 sockets, 1 core per socket, 1 thread per core
> 
> Hope that helps!

The fact that you are using 8 VCPUs is definitely relevant. However, I
don't think it would make sense to try to analyze any errors with an
OVMF / edk2 tree this old. Please try to reproduce the issue with a
fresh build from master.

Thanks!
Laszlo

> 2016-11-22 16:41 GMT+03:00 Laszlo Ersek <[email protected]>:
> 
>> Hello Evgeny,
>>
>> On 11/22/16 13:57, Evgeny Yakovlev wrote:
>>> We are running windows UEFI-based VMs on QEMU/KVM with OvmfPkg.
>>>
>>> Very rarely we are experiencing a crash when VM tries to write to RO
>> memory
>>> very early during UEFI boot process.
>>>
>>> Crash happens when VM tries to execute this code in interrupt handler:
>>> https://github.com/tianocore/edk2/blob/master/UefiCpuPkg/Library/
>> CpuExceptionHandlerLib/X64/ExceptionHandlerAsm.asm#L244-L246
>>>
>>>
>>> fxsave [rdi], where RDI = 0xffe60
>>>
>>> Which is bad - it points to ISA BIOS F-segment area.
>>>
>>> This memory was mapped by qemu for read only access, which is reflected
>> in
>>> KVM EPT:
>>> 00000000000e0000-00000000000fffff (prio 1, R-): isa-bios
>>>
>>> This is a very early IRQ0 interrupt, presumably during early
>> initialization
>>> phase (Sec or Pei).
>>>
>>> Looks like CommonInterruptHandler does not switch to a separate stack and
>>> works on interrupted context's stack, which was fairly close to 1MB
>>> boundary when IRQ0 fired (RSP around 1002c0). When CommonInterruptEntry
>>> reached highlighted code it subtracted 512 bytes from current RSP which
>>> dropped to 0xffe60, below 1MB and into QEMU RO region.
>>>
>>> We were figuring out how to best fix this. Possible solutions are to
>> switch
>>> to a separate stack in CommonInterruptEntry, relocate early OvmfPkg stack
>>> to somewhere farther away from 1MB, to run with interrupts disabled until
>>> we reach a later phase or maybe something else.
>>>
>>> Any comments would be very appreciated!
>>
>> I wonder if you started to see this issue very recently.
>>
>> I suspect (hope!) that the symptoms you are experiencing are a
>> consequence of a bug in UefiCpuPkg that I've debugged and fixed just
>> today. (I hope to post the patches today.)
>>
>> While testing those patches on your end will of course tell us if your
>> issue has the same root cause, you could gather a few more symptoms even
>> before I get around posting the patches. The bug that I'm working on has
>> extremely varied crash symptoms (basically the APs wander off into the
>> weeds), and some of those symptoms have involved CpuExceptionHandlerLib.
>> The point is, by the time we get into CpuExceptionHandlerLib, all is
>> lost -- it is executing on an AP whose state is corrupt anyway. The
>> fxsave symptom is a red herring, most likely.
>>
>> CpuExceptionHandlerLib works fine otherwise, especially when invoked
>> from the BSP -- we've used the output dumped by CpuExceptionHandlerLib
>> to the serial port several times to track down issues.
>>
>> So, my request is that you please capture the OVMF debug log (please see
>> the "OvmfPkg/README" file for how). I'm curious if it crashes where and
>> how I suspect it crashes.
>>
>> Also, it would help if you provided
>> - your host CPU model,
>> - the host kernel (KVM) version,
>> - the guest CPU model,
>> - the guest CPU topology.
>>
>> Thanks!
>> Laszlo
>>
> _______________________________________________
> edk2-devel mailing list
> [email protected]
> https://lists.01.org/mailman/listinfo/edk2-devel
> 

_______________________________________________
edk2-devel mailing list
[email protected]
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to