On 10/15/15 00:25, Laszlo Ersek wrote:
> Test environment and results:
>
> Host kernel:
> - latest RHEL-7 development kernel (3.10.0-323.el7), with Paolo's
> following patches backported by yours truly:
> - KVM: x86: clean up kvm_arch_vcpu_runnable
> - KVM: x86: fix SMI to halted VCPU
>
> QEMU:
> - current upstream (c49d3411faae), with Paolo's patch applied:
> - target-i386: allow any alignment for SMBASE
>
> Below, the meaning of "bitness=32" is:
> * qemu-system-i386
> * -cpu coreduo,-nx
>
> Whereas "bitness=64" means:
> * qemu-system-x86_64
> * no special -cpu flag
>
> For variable access verification, "efibootmgr" is invoked (without
> options) at the guest OS (Fedlet 20141209) root prompt.
>
> bitness accel VCPUs result
> ------- ----- ----- -----------------------------------------------
> 32 KVM 1 Fedlet 20141209 boots, S3 works, variables work
>
> 32 KVM 2 stuck in SMBASE relocation, APIC IDs look valid
Alright, so I've dug into this. It's very interesting.
First, here's the debug patch for edk2:
-------------
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c
b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c
index 0e39173..bcfa075 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c
@@ -442,11 +442,15 @@ SmmRelocateBases (
for (Index = 0; Index < mNumberOfCpus; Index++) {
mRebased[Index] = FALSE;
if (ApicId != (UINT32)gSmmCpuPrivate->ProcessorInfo[Index].ProcessorId) {
+ DEBUG ((EFI_D_VERBOSE, "%a: sending SMI IPI to APIC ID 0x%Lx\n",
+ __FUNCTION__, gSmmCpuPrivate->ProcessorInfo[Index].ProcessorId));
SendSmiIpi ((UINT32)gSmmCpuPrivate->ProcessorInfo[Index].ProcessorId);
//
// Wait for this AP to finish its 1st SMI
//
while (!mRebased[Index]);
+ DEBUG ((EFI_D_VERBOSE, "%a: APIC ID 0x%Lx has processed its first SMI\n",
+ __FUNCTION__, gSmmCpuPrivate->ProcessorInfo[Index].ProcessorId));
} else {
//
// BSP will be Relocated later
-------------
As one can expect, the first message appears in the log:
------------
SMRAM TileSize = 00000800
CPU[000] APIC ID=0000 SMBASE=7FFC1000 SaveState=7FFD0C00 Size=00000400
CPU[001] APIC ID=0001 SMBASE=7FFC1800 SaveState=7FFD1400 Size=00000400
SmmRelocateBases: sending SMI IPI to APIC ID 0x1
------------
but the second message doesn't; the (!mRebased[Index]) condition never
evaluates to false, so the loop is never exited.
Second, I sought to analyze the KVM trace very carefully, against the
SendSmiIpi() source code in edk2, and against the KVM source code. Here comes
the kicker: KVM interprets the APIC ICR (high, low) writes correctly, injects
the SMI, VCPU#1 wakes and enters SMM (!), then leaves SMM with a relocated
SMBASE field (!!!).
*However*, according to the KVM trace, the relocated SMBASE field is *wrong* --
the value being reported below, 0x7ffc1000, corresponds to CPU#0 above!
------------
qemu-system-i38-22085 [000] 13634.057590: kvm_enter_smm: vcpu 1:
leaving SMM, smbase 0x7ffc1000
------------
Then VCPU#1 goes on to do various things (I'm too lazy to analyze all those
trace entries), but ultimately it reaches a HLT. And the busy wait in
SmmRelocateBases() never completes, because vcpu #1 seems to have looked at
VCPU#0's area.
Given that this works with TCG, I *guess* it is either a KVM bug, or some
visibility race. I'll have to look at more.
Thanks
Laszlo
>
> 32 TCG 1 Fedlet 20141209 boots, S3 works, variables work
>
> 32 TCG 2 Fedlet 20141209 boots, variables (efibootmgr)
> are broken -- nothing is printed
(the variable issue has been addressed by my QEMU patch, being pulled, S3 to be
verified / fixed)
>
> 64 KVM >=1 "KVM: entry failed, hardware error 0x80000021"
> while guest in SMBASE relocation
(host kernel build in progress, after which this error will hopefully go away,
and the results will be identical to the 32-bit case)
> 64 TCG 1 F21 XFCE LiveCD boots, variable access OK, S3
> resume triggers InternalX86EnablePaging64()
> ASSERT() in
> "MdePkg/Library/BaseLib/X64/Non-existing.c".
> Looks like a bug in S3Resume2Pei?
>
> 64 TCG 2 F21 XFCE LiveCD boots, variable access
> (efibootmgr) is broken -- reports EINVAL
(the variable issue has been addressed by my QEMU patch, being pulled, S3 to be
verified / fixed)
_______________________________________________
edk2-devel mailing list
[email protected]
https://lists.01.org/mailman/listinfo/edk2-devel