On 10/21/15 17:04, Laszlo Ersek wrote:
> On 10/21/15 14:10, Laszlo Ersek wrote:
>> On 10/15/15 00:25, Laszlo Ersek wrote:
>>
>>> Test environment and results:
>>>
>>> Host kernel:
>>> - latest RHEL-7 development kernel (3.10.0-323.el7), with Paolo's
>>> following patches backported by yours truly:
>>> - KVM: x86: clean up kvm_arch_vcpu_runnable
>>> - KVM: x86: fix SMI to halted VCPU
>>>
>>> QEMU:
>>> - current upstream (c49d3411faae), with Paolo's patch applied:
>>> - target-i386: allow any alignment for SMBASE
>>>
>>> Below, the meaning of "bitness=32" is:
>>> * qemu-system-i386
>>> * -cpu coreduo,-nx
>>>
>>> Whereas "bitness=64" means:
>>> * qemu-system-x86_64
>>> * no special -cpu flag
>>>
>>> For variable access verification, "efibootmgr" is invoked (without
>>> options) at the guest OS (Fedlet 20141209) root prompt.
>>>
>>> bitness accel VCPUs result
>>> ------- ----- ----- -----------------------------------------------
>>> 32 KVM 1 Fedlet 20141209 boots, S3 works, variables work
>>>
>>> 32 KVM 2 stuck in SMBASE relocation, APIC IDs look valid
>>
>> Alright, so I've dug into this. It's very interesting.
>>
>> First, here's the debug patch for edk2:
>>
>> -------------
>> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c
>> b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c
>> index 0e39173..bcfa075 100644
>> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c
>> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c
>> @@ -442,11 +442,15 @@ SmmRelocateBases (
>> for (Index = 0; Index < mNumberOfCpus; Index++) {
>> mRebased[Index] = FALSE;
>> if (ApicId != (UINT32)gSmmCpuPrivate->ProcessorInfo[Index].ProcessorId)
>> {
>> + DEBUG ((EFI_D_VERBOSE, "%a: sending SMI IPI to APIC ID 0x%Lx\n",
>> + __FUNCTION__, gSmmCpuPrivate->ProcessorInfo[Index].ProcessorId));
>> SendSmiIpi ((UINT32)gSmmCpuPrivate->ProcessorInfo[Index].ProcessorId);
>> //
>> // Wait for this AP to finish its 1st SMI
>> //
>> while (!mRebased[Index]);
>> + DEBUG ((EFI_D_VERBOSE, "%a: APIC ID 0x%Lx has processed its first
>> SMI\n",
>> + __FUNCTION__, gSmmCpuPrivate->ProcessorInfo[Index].ProcessorId));
>> } else {
>> //
>> // BSP will be Relocated later
>> -------------
>>
>> As one can expect, the first message appears in the log:
>>
>> ------------
>> SMRAM TileSize = 00000800
>> CPU[000] APIC ID=0000 SMBASE=7FFC1000 SaveState=7FFD0C00 Size=00000400
>> CPU[001] APIC ID=0001 SMBASE=7FFC1800 SaveState=7FFD1400 Size=00000400
>> SmmRelocateBases: sending SMI IPI to APIC ID 0x1
>> ------------
>>
>> but the second message doesn't; the (!mRebased[Index]) condition
>> never evaluates to false, so the loop is never exited.
>>
>> Second, I sought to analyze the KVM trace very carefully, against the
>> SendSmiIpi() source code in edk2, and against the KVM source code.
>> Here comes the kicker: KVM interprets the APIC ICR (high, low) writes
>> correctly, injects the SMI, VCPU#1 wakes and enters SMM (!), then
>> leaves SMM with a relocated SMBASE field (!!!).
>>
>> *However*, according to the KVM trace, the relocated SMBASE field is
>> *wrong* -- the value being reported below, 0x7ffc1000, corresponds to
>> CPU#0 above!
>>
>> ------------
>> qemu-system-i38-22085 [000] 13634.057590: kvm_enter_smm: vcpu 1:
>> leaving SMM, smbase 0x7ffc1000
>> ------------
>>
>> Then VCPU#1 goes on to do various things (I'm too lazy to analyze all
>> those trace entries), but ultimately it reaches a HLT. And the busy
>> wait in SmmRelocateBases() never completes, because vcpu #1 seems to
>> have looked at VCPU#0's area.
>>
>> Given that this works with TCG, I *guess* it is either a KVM bug, or
>> some visibility race. I'll have to look at more.
>
> I added more debug messages to edk2, and compared the logs generated
> when running on KVM against those generated when running on TCG. The
> point where things go sideways in edk2 is this:
>
> The SmmInitHandler() function
> [UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c] has the following leading
> comment:
>
> "C function for SMI handler. To change all processor's SMMBase Register."
>
> It is called from the assembly language entry point code, in System
> Management Mode, for the purposes of initial SMBASE relocation.
>
> In this function, the processor that executes the function identifies
> itself by calling GetApicId(). The retrieved APIC ID is located within
> the ProcessorInfo array (in order to map the APIC ID to a CPU index),
> and then the rest of the relocation code uses this CPU index, for poking
> into the SMBASE and SaveState "tiles".
>
> Now, for VCPU#1, GetApicId() returns APIC ID 0x01 on TCG, and 0x00 on
> KVM. Things go downhill from there.
More strangeness: after replacing all GetApicId() calls with
GetInitialApicId() in PiSmmCpuDxeSmm, the SMBASE relocation succeeds for
(32-bit, KVM, 2 VCPUs), but (64-bit, KVM, 2 VCPUs) remains stuck in the
loop. %-/
Anyway I'll stop spamming you and the list for a while.
Thanks
Laszlo
_______________________________________________
edk2-devel mailing list
[email protected]
https://lists.01.org/mailman/listinfo/edk2-devel