On 10/21/15 17:04, Laszlo Ersek wrote: > On 10/21/15 14:10, Laszlo Ersek wrote: >> On 10/15/15 00:25, Laszlo Ersek wrote: >> >>> Test environment and results: >>> >>> Host kernel: >>> - latest RHEL-7 development kernel (3.10.0-323.el7), with Paolo's >>> following patches backported by yours truly: >>> - KVM: x86: clean up kvm_arch_vcpu_runnable >>> - KVM: x86: fix SMI to halted VCPU >>> >>> QEMU: >>> - current upstream (c49d3411faae), with Paolo's patch applied: >>> - target-i386: allow any alignment for SMBASE >>> >>> Below, the meaning of "bitness=32" is: >>> * qemu-system-i386 >>> * -cpu coreduo,-nx >>> >>> Whereas "bitness=64" means: >>> * qemu-system-x86_64 >>> * no special -cpu flag >>> >>> For variable access verification, "efibootmgr" is invoked (without >>> options) at the guest OS (Fedlet 20141209) root prompt. >>> >>> bitness accel VCPUs result >>> ------- ----- ----- ----------------------------------------------- >>> 32 KVM 1 Fedlet 20141209 boots, S3 works, variables work >>> >>> 32 KVM 2 stuck in SMBASE relocation, APIC IDs look valid >> >> Alright, so I've dug into this. It's very interesting. >> >> First, here's the debug patch for edk2: >> >> ------------- >> diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c >> b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c >> index 0e39173..bcfa075 100644 >> --- a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c >> +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c >> @@ -442,11 +442,15 @@ SmmRelocateBases ( >> for (Index = 0; Index < mNumberOfCpus; Index++) { >> mRebased[Index] = FALSE; >> if (ApicId != (UINT32)gSmmCpuPrivate->ProcessorInfo[Index].ProcessorId) >> { >> + DEBUG ((EFI_D_VERBOSE, "%a: sending SMI IPI to APIC ID 0x%Lx\n", >> + __FUNCTION__, gSmmCpuPrivate->ProcessorInfo[Index].ProcessorId)); >> SendSmiIpi ((UINT32)gSmmCpuPrivate->ProcessorInfo[Index].ProcessorId); >> // >> // Wait for this AP to finish its 1st SMI >> // >> while (!mRebased[Index]); >> + DEBUG ((EFI_D_VERBOSE, "%a: APIC ID 0x%Lx has processed its first >> SMI\n", >> + __FUNCTION__, gSmmCpuPrivate->ProcessorInfo[Index].ProcessorId)); >> } else { >> // >> // BSP will be Relocated later >> ------------- >> >> As one can expect, the first message appears in the log: >> >> ------------ >> SMRAM TileSize = 00000800 >> CPU[000] APIC ID=0000 SMBASE=7FFC1000 SaveState=7FFD0C00 Size=00000400 >> CPU[001] APIC ID=0001 SMBASE=7FFC1800 SaveState=7FFD1400 Size=00000400 >> SmmRelocateBases: sending SMI IPI to APIC ID 0x1 >> ------------ >> >> but the second message doesn't; the (!mRebased[Index]) condition >> never evaluates to false, so the loop is never exited. >> >> Second, I sought to analyze the KVM trace very carefully, against the >> SendSmiIpi() source code in edk2, and against the KVM source code. >> Here comes the kicker: KVM interprets the APIC ICR (high, low) writes >> correctly, injects the SMI, VCPU#1 wakes and enters SMM (!), then >> leaves SMM with a relocated SMBASE field (!!!). >> >> *However*, according to the KVM trace, the relocated SMBASE field is >> *wrong* -- the value being reported below, 0x7ffc1000, corresponds to >> CPU#0 above! >> >> ------------ >> qemu-system-i38-22085 [000] 13634.057590: kvm_enter_smm: vcpu 1: >> leaving SMM, smbase 0x7ffc1000 >> ------------ >> >> Then VCPU#1 goes on to do various things (I'm too lazy to analyze all >> those trace entries), but ultimately it reaches a HLT. And the busy >> wait in SmmRelocateBases() never completes, because vcpu #1 seems to >> have looked at VCPU#0's area. >> >> Given that this works with TCG, I *guess* it is either a KVM bug, or >> some visibility race. I'll have to look at more. > > I added more debug messages to edk2, and compared the logs generated > when running on KVM against those generated when running on TCG. The > point where things go sideways in edk2 is this: > > The SmmInitHandler() function > [UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c] has the following leading > comment: > > "C function for SMI handler. To change all processor's SMMBase Register." > > It is called from the assembly language entry point code, in System > Management Mode, for the purposes of initial SMBASE relocation. > > In this function, the processor that executes the function identifies > itself by calling GetApicId(). The retrieved APIC ID is located within > the ProcessorInfo array (in order to map the APIC ID to a CPU index), > and then the rest of the relocation code uses this CPU index, for poking > into the SMBASE and SaveState "tiles". > > Now, for VCPU#1, GetApicId() returns APIC ID 0x01 on TCG, and 0x00 on > KVM. Things go downhill from there.
More strangeness: after replacing all GetApicId() calls with GetInitialApicId() in PiSmmCpuDxeSmm, the SMBASE relocation succeeds for (32-bit, KVM, 2 VCPUs), but (64-bit, KVM, 2 VCPUs) remains stuck in the loop. %-/ Anyway I'll stop spamming you and the list for a while. Thanks Laszlo _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel