On 10/21/15 17:04, Laszlo Ersek wrote:
> On 10/21/15 14:10, Laszlo Ersek wrote:
>> On 10/15/15 00:25, Laszlo Ersek wrote:
>>
>>> Test environment and results:
>>>
>>> Host kernel:
>>> - latest RHEL-7 development kernel (3.10.0-323.el7), with Paolo's
>>>   following patches backported by yours truly:
>>>   - KVM: x86: clean up kvm_arch_vcpu_runnable
>>>   - KVM: x86: fix SMI to halted VCPU
>>>
>>> QEMU:
>>> - current upstream (c49d3411faae), with Paolo's patch applied:
>>>   - target-i386: allow any alignment for SMBASE
>>>
>>> Below, the meaning of "bitness=32" is:
>>> * qemu-system-i386
>>> * -cpu coreduo,-nx
>>>
>>> Whereas "bitness=64" means:
>>> * qemu-system-x86_64
>>> * no special -cpu flag
>>>
>>> For variable access verification, "efibootmgr" is invoked (without
>>> options) at the guest OS (Fedlet 20141209) root prompt.
>>>
>>>   bitness  accel  VCPUs  result
>>>   -------  -----  -----  -----------------------------------------------
>>>   32       KVM    1      Fedlet 20141209 boots, S3 works, variables work
>>>
>>>   32       KVM    2      stuck in SMBASE relocation, APIC IDs look valid
>>
>> Alright, so I've dug into this. It's very interesting.
>>
>> First, here's the debug patch for edk2:
>>
>> -------------
>> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c 
>> b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c
>> index 0e39173..bcfa075 100644
>> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c
>> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c
>> @@ -442,11 +442,15 @@ SmmRelocateBases (
>>    for (Index = 0; Index < mNumberOfCpus; Index++) {
>>      mRebased[Index] = FALSE;
>>      if (ApicId != (UINT32)gSmmCpuPrivate->ProcessorInfo[Index].ProcessorId) 
>> {
>> +      DEBUG ((EFI_D_VERBOSE, "%a: sending SMI IPI to APIC ID 0x%Lx\n",
>> +        __FUNCTION__, gSmmCpuPrivate->ProcessorInfo[Index].ProcessorId));
>>        SendSmiIpi ((UINT32)gSmmCpuPrivate->ProcessorInfo[Index].ProcessorId);
>>        //
>>        // Wait for this AP to finish its 1st SMI
>>        //
>>        while (!mRebased[Index]);
>> +      DEBUG ((EFI_D_VERBOSE, "%a: APIC ID 0x%Lx has processed its first 
>> SMI\n",
>> +        __FUNCTION__, gSmmCpuPrivate->ProcessorInfo[Index].ProcessorId));
>>      } else {
>>        //
>>        // BSP will be Relocated later
>> -------------
>>
>> As one can expect, the first message appears in the log:
>>
>> ------------
>> SMRAM TileSize = 00000800
>> CPU[000]  APIC ID=0000  SMBASE=7FFC1000  SaveState=7FFD0C00  Size=00000400
>> CPU[001]  APIC ID=0001  SMBASE=7FFC1800  SaveState=7FFD1400  Size=00000400
>> SmmRelocateBases: sending SMI IPI to APIC ID 0x1
>> ------------
>>
>> but the second message doesn't; the (!mRebased[Index]) condition
>> never evaluates to false, so the loop is never exited.
>>
>> Second, I sought to analyze the KVM trace very carefully, against the
>> SendSmiIpi() source code in edk2, and against the KVM source code.
>> Here comes the kicker: KVM interprets the APIC ICR (high, low) writes
>> correctly, injects the SMI, VCPU#1 wakes and enters SMM (!), then
>> leaves SMM with a relocated SMBASE field (!!!).
>>
>> *However*, according to the KVM trace, the relocated SMBASE field is
>> *wrong* -- the value being reported below, 0x7ffc1000, corresponds to
>> CPU#0 above!
>>
>> ------------
>>  qemu-system-i38-22085 [000] 13634.057590: kvm_enter_smm:        vcpu 1: 
>> leaving SMM, smbase 0x7ffc1000
>> ------------
>>
>> Then VCPU#1 goes on to do various things (I'm too lazy to analyze all
>> those trace entries), but ultimately it reaches a HLT. And the busy
>> wait in SmmRelocateBases() never completes, because vcpu #1 seems to
>> have looked at VCPU#0's area.
>>
>> Given that this works with TCG, I *guess* it is either a KVM bug, or
>> some visibility race. I'll have to look at more.
> 
> I added more debug messages to edk2, and compared the logs generated
> when running on KVM against those generated when running on TCG. The
> point where things go sideways in edk2 is this:
> 
> The SmmInitHandler() function
> [UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c] has the following leading
> comment:
> 
> "C function for SMI handler. To change all processor's SMMBase Register."
> 
> It is called from the assembly language entry point code, in System
> Management Mode, for the purposes of initial SMBASE relocation.
> 
> In this function, the processor that executes the function identifies
> itself by calling GetApicId(). The retrieved APIC ID is located within
> the ProcessorInfo array (in order to map the APIC ID to a CPU index),
> and then the rest of the relocation code uses this CPU index, for poking
> into the SMBASE and SaveState "tiles".
> 
> Now, for VCPU#1, GetApicId() returns APIC ID 0x01 on TCG, and 0x00 on
> KVM. Things go downhill from there.

More strangeness: after replacing all GetApicId() calls with
GetInitialApicId() in PiSmmCpuDxeSmm, the SMBASE relocation succeeds for
(32-bit, KVM, 2 VCPUs), but (64-bit, KVM, 2 VCPUs) remains stuck in the
loop. %-/

Anyway I'll stop spamming you and the list for a while.

Thanks
Laszlo
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to