On 10/15/15 00:25, Laszlo Ersek wrote:

> Test environment and results:
> 
> Host kernel:
> - latest RHEL-7 development kernel (3.10.0-323.el7), with Paolo's
>   following patches backported by yours truly:
>   - KVM: x86: clean up kvm_arch_vcpu_runnable
>   - KVM: x86: fix SMI to halted VCPU
> 
> QEMU:
> - current upstream (c49d3411faae), with Paolo's patch applied:
>   - target-i386: allow any alignment for SMBASE
> 
> Below, the meaning of "bitness=32" is:
> * qemu-system-i386
> * -cpu coreduo,-nx
> 
> Whereas "bitness=64" means:
> * qemu-system-x86_64
> * no special -cpu flag
> 
> For variable access verification, "efibootmgr" is invoked (without
> options) at the guest OS (Fedlet 20141209) root prompt.
> 
>   bitness  accel  VCPUs  result
>   -------  -----  -----  -----------------------------------------------
>   32       KVM    1      Fedlet 20141209 boots, S3 works, variables work
> 
>   32       KVM    2      stuck in SMBASE relocation, APIC IDs look valid

Alright, so I've dug into this. It's very interesting.

First, here's the debug patch for edk2:

-------------
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c 
b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c
index 0e39173..bcfa075 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c
@@ -442,11 +442,15 @@ SmmRelocateBases (
   for (Index = 0; Index < mNumberOfCpus; Index++) {
     mRebased[Index] = FALSE;
     if (ApicId != (UINT32)gSmmCpuPrivate->ProcessorInfo[Index].ProcessorId) {
+      DEBUG ((EFI_D_VERBOSE, "%a: sending SMI IPI to APIC ID 0x%Lx\n",
+        __FUNCTION__, gSmmCpuPrivate->ProcessorInfo[Index].ProcessorId));
       SendSmiIpi ((UINT32)gSmmCpuPrivate->ProcessorInfo[Index].ProcessorId);
       //
       // Wait for this AP to finish its 1st SMI
       //
       while (!mRebased[Index]);
+      DEBUG ((EFI_D_VERBOSE, "%a: APIC ID 0x%Lx has processed its first SMI\n",
+        __FUNCTION__, gSmmCpuPrivate->ProcessorInfo[Index].ProcessorId));
     } else {
       //
       // BSP will be Relocated later
-------------

As one can expect, the first message appears in the log:

------------
SMRAM TileSize = 00000800
CPU[000]  APIC ID=0000  SMBASE=7FFC1000  SaveState=7FFD0C00  Size=00000400
CPU[001]  APIC ID=0001  SMBASE=7FFC1800  SaveState=7FFD1400  Size=00000400
SmmRelocateBases: sending SMI IPI to APIC ID 0x1
------------

but the second message doesn't; the (!mRebased[Index]) condition never 
evaluates to false, so the loop is never exited.

Second, I sought to analyze the KVM trace very carefully, against the 
SendSmiIpi() source code in edk2, and against the KVM source code. Here comes 
the kicker: KVM interprets the APIC ICR (high, low) writes correctly, injects 
the SMI, VCPU#1 wakes and enters SMM (!), then leaves SMM with a relocated 
SMBASE field (!!!).

*However*, according to the KVM trace, the relocated SMBASE field is *wrong* -- 
the value being reported below, 0x7ffc1000, corresponds to CPU#0 above!

------------
 qemu-system-i38-22085 [000] 13634.057590: kvm_enter_smm:        vcpu 1: 
leaving SMM, smbase 0x7ffc1000
------------

Then VCPU#1 goes on to do various things (I'm too lazy to analyze all those 
trace entries), but ultimately it reaches a HLT. And the busy wait in 
SmmRelocateBases() never completes, because vcpu #1 seems to have looked at 
VCPU#0's area.

Given that this works with TCG, I *guess* it is either a KVM bug, or some 
visibility race. I'll have to look at more.

Thanks
Laszlo




> 
>   32       TCG    1      Fedlet 20141209 boots, S3 works, variables work
> 
>   32       TCG    2      Fedlet 20141209 boots, variables (efibootmgr)
>                          are broken -- nothing is printed

(the variable issue has been addressed by my QEMU patch, being pulled, S3 to be 
verified / fixed)

> 
>   64       KVM    >=1    "KVM: entry failed, hardware error 0x80000021"
>                          while guest in SMBASE relocation

(host kernel build in progress, after which this error will hopefully go away, 
and the results will be identical to the 32-bit case)

>   64       TCG    1      F21 XFCE LiveCD boots, variable access OK, S3
>                          resume triggers InternalX86EnablePaging64()
>                          ASSERT() in
>                          "MdePkg/Library/BaseLib/X64/Non-existing.c".
>                          Looks like a bug in S3Resume2Pei?
> 
>   64       TCG    2      F21 XFCE LiveCD boots, variable access
>                          (efibootmgr) is broken -- reports EINVAL

(the variable issue has been addressed by my QEMU patch, being pulled, S3 to be 
verified / fixed)
_______________________________________________
edk2-devel mailing list
[email protected]
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to