Got some terrible news: On 10/24/15 02:18, Laszlo Ersek wrote:
[snip] > * QEMU: > - Current upstream "master", at bc79082e4cd1. > > - Plus the following patch applied: > [PATCH] hw/isa/lpc_ich9: inject SMI on all VCPUs if APM_STS == 'Q' > http://thread.gmane.org/gmane.comp.emulators.qemu/371195 > > > * edk2 / OVMF: [snip] > - In addition, matching the QEMU patch referenced above, > > [PATCH v3 27/52] OvmfPkg: use relaxed AP SMM synchronization mode > > becomes unnecessary and is dropped, *and* the incremental OVMF patch > that I'm attaching now purely for illustration is squashed into > > [PATCH v3 13/52] OvmfPkg: implement EFI_SMM_CONTROL2_PROTOCOL with a > DXE_RUNTIME_DRIVER [snip] > * Results: > > accel bits guest OS OS boots efibootmgr works on S3 resume > ----- ---- --------------- -------- ------------------- --------- > TCG 32 Fedlet 20141209 pass[1] BSP and AP pass > > TCG 64 F21 XFCE LiveCD pass[1] BSP and AP fail[2] > > KVM 32 Fedlet 20141209 pass BSP and AP pass > > KVM 64 F21 XFCE LiveCD pass BSP and AP fail[2] > > KVM 64 Windows 8.1 pass n/a fail[2] > > [1] Although the boot is successful, I'm seeing one worrying sign: it > looks like sometime after boot (when OVMF is "all done"), the AP > starts executing the firmware from flash (I can see the SEC messages > up to and including "DecompressMemFvs"). I don't understand why this > happens, but it doesn't seem right. In any case, it didn't break > these tests. I understand now why [1] happens. My QEMU and OVMF patches that implemented broadcast SMI on writes to APM_CNT triggered a bug in UefiCpuPkg. First, please look at the ProcessorToIdleState() function in "UefiCpuPkg/CpuDxe/CpuMp.c". The leading comment is: Application Processors do loop routine after switch to its own stack. It has an infinite loop with CpuSleep() -- HLT -- and CpuPause() -- PAUSE -- calls in it. I added the following line to this function: diff --git a/UefiCpuPkg/CpuDxe/CpuMp.c b/UefiCpuPkg/CpuDxe/CpuMp.c index 04c2f1f..d1a94b1 100644 --- a/UefiCpuPkg/CpuDxe/CpuMp.c +++ b/UefiCpuPkg/CpuDxe/CpuMp.c @@ -1295,6 +1295,7 @@ ProcessorToIdleState ( } CpuPause (); + asm volatile ("outb %%al,%%dx": :"d" (0x402), "a" ('Q')); } CpuSleep (); This makes each AP print a Q character to the QEMU debug port every time they are woken in the idle loop (spuriously or otherwise). If I build OVMF without -D SMM_REQUIRE, this line of code never runs, practically. If I build OVMF with -D SMM_REQUIRE (and, remember, with my SMI broadcast patches in place), then every time the APs are dragged into SMM and *return* from SMM, they run one iteration of this loop -- they wake up, print the Q character, then go back to sleep. Not so bad, right? It just proves that an SMI wakes a sleeping processor, and that the RSM needs to return to somewhere. Now please turn your attention to: - the ExitBootServicesCallback() function in the same file - the commit message for that function (git 9840b129 / SVN r16397) - and the discussion from last November that led to it: http://thread.gmane.org/gmane.comp.bios.tianocore.devel/11244 The ExitBootServicesCallback() function that we have now -- which sends an INIT IPI to the APs, so they remain dormant until a (double) SIPI -- is good enough *assuming* that nothing wakes those APs between ExitBootServices() and the time the runtime OS starts them up for good. Unfortunately, this assumption is no longer valid, with the broadcast SMI idea: the variable services can be (and are) called in said interval, and they wake the dormant APs. The in-SMM stuff works just fine, but the APs have nowhere to return to, when they see the RSM. They weren't *running* when they got the SMI! So that's why I see the APs uncontrollably rebooting under [1] above. There are two options: (1) I'm noping out of the broadcast SMI idea faster than you can say "fandango on core" -- writing to APM_CNT will raise the SMI only on the current processor. This means keeping Paolo's Relaxed Sync Mode patch. It also implies a huge performance penalty for variable services that are executed on APs -- on both TCG and KVM. Here's a comparison on KVM: > [root@ovmf-fedora-q35 ~]# time taskset -c 0 efibootmgr > BootCurrent: 0001 > Timeout: 0 seconds > BootOrder: 0001,0000,0003 > Boot0000* EFI SCSI Device > Boot0001* Fedora > Boot0003* EFI Internal Shell > > real 0m0.006s > user 0m0.000s > sys 0m0.005s <------- versus > [root@ovmf-fedora-q35 ~]# time taskset -c 1 efibootmgr > BootCurrent: 0001 > Timeout: 0 seconds > BootOrder: 0001,0000,0003 > Boot0000* EFI SCSI Device > Boot0001* Fedora > Boot0003* EFI Internal Shell > > real 0m36.013s > user 0m1.637s > sys 0m34.374s <------- 3-4 orders of magnitude slower (2) The other option is keeping the broadcast SMI idea (the actual way to configure that in QEMU remains a topic of discussion), but then the ExitBootServices() callback in CpuDxe will have to implement Jeff's idea from http://thread.gmane.org/gmane.comp.bios.tianocore.devel/11244/focus=11247 That is, make all APs turn off paging, and execute a halt loop. (So the RSM instruction from the SMI handler has a place to return to, until the runtime OS takes control of those APs. Also, I think the ExitBootServices() callback shouldn't return until it is sure that all APs are safely off of their original code and stacks, so some synchronization will be necessary too...) I'm sorry but I don't know enough to work on mode switches, so I can't volunteer for (2). Can someone else please? Thanks Laszlo _______________________________________________ edk2-devel mailing list [email protected] https://lists.01.org/mailman/listinfo/edk2-devel

