On 10/26/15 02:43, Fan, Jeff wrote:
> Laszlo,
> 
> I could provide the patch to place AP into protected mode in
> ExitBootService Callback function for your test.

That would be awesome, thank you!
Laszlo

> 
> Thanks!
> Jeff
> 
> -----Original Message-----
> From: Laszlo Ersek [mailto:[email protected]] 
> Sent: Monday, October 26, 2015 9:33 AM
> To: Paolo Bonzini; Gerd Hoffmann; Justen, Jordan L; Kinney, Michael D; Fan, 
> Jeff; Chen Fan
> Cc: edk2-devel-01
> Subject: Re: [edk2] OVMF SMM status 2015-Oct-24
> 
> Got some terrible news:
> 
> On 10/24/15 02:18, Laszlo Ersek wrote:
> 
> [snip]
> 
>> * QEMU:
>> - Current upstream "master", at bc79082e4cd1.
>>
>> - Plus the following patch applied:
>>   [PATCH] hw/isa/lpc_ich9: inject SMI on all VCPUs if APM_STS == 'Q'
>>   http://thread.gmane.org/gmane.comp.emulators.qemu/371195
>>
>>
>> * edk2 / OVMF:
> 
> [snip]
> 
>> - In addition, matching the QEMU patch referenced above,
>>
>>   [PATCH v3 27/52] OvmfPkg: use relaxed AP SMM synchronization mode
>>
>>   becomes unnecessary and is dropped, *and* the incremental OVMF patch
>>   that I'm attaching now purely for illustration is squashed into
>>
>>   [PATCH v3 13/52] OvmfPkg: implement EFI_SMM_CONTROL2_PROTOCOL with a
>>                    DXE_RUNTIME_DRIVER
> 
> [snip]
> 
>> * Results:
>>
>>   accel  bits  guest OS         OS boots  efibootmgr works on  S3 resume
>>   -----  ----  ---------------  --------  -------------------  ---------
>>   TCG    32    Fedlet 20141209  pass[1]   BSP and AP           pass
>>
>>   TCG    64    F21 XFCE LiveCD  pass[1]   BSP and AP           fail[2]
>>
>>   KVM    32    Fedlet 20141209  pass      BSP and AP           pass
>>
>>   KVM    64    F21 XFCE LiveCD  pass      BSP and AP           fail[2]
>>
>>   KVM    64    Windows 8.1      pass      n/a                  fail[2]
>>
>> [1] Although the boot is successful, I'm seeing one worrying sign: it
>>     looks like sometime after boot (when OVMF is "all done"), the AP
>>     starts executing the firmware from flash (I can see the SEC messages
>>     up to and including "DecompressMemFvs"). I don't understand why this
>>     happens, but it doesn't seem right. In any case, it didn't break
>>     these tests.
> 
> I understand now why [1] happens. My QEMU and OVMF patches that implemented 
> broadcast SMI on writes to APM_CNT triggered a bug in UefiCpuPkg.
> 
> First, please look at the ProcessorToIdleState() function in 
> "UefiCpuPkg/CpuDxe/CpuMp.c". The leading comment is:
> 
>   Application Processors do loop routine
>   after switch to its own stack.
> 
> It has an infinite loop with CpuSleep() -- HLT -- and CpuPause() -- PAUSE -- 
> calls in it.
> 
> I added the following line to this function:
> 
> diff --git a/UefiCpuPkg/CpuDxe/CpuMp.c b/UefiCpuPkg/CpuDxe/CpuMp.c index 
> 04c2f1f..d1a94b1 100644
> --- a/UefiCpuPkg/CpuDxe/CpuMp.c
> +++ b/UefiCpuPkg/CpuDxe/CpuMp.c
> @@ -1295,6 +1295,7 @@ ProcessorToIdleState (
>      }
> 
>      CpuPause ();
> +    asm volatile ("outb %%al,%%dx": :"d" (0x402), "a" ('Q'));
>    }
> 
>    CpuSleep ();
> 
> This makes each AP print a Q character to the QEMU debug port every time they 
> are woken in the idle loop (spuriously or otherwise).
> 
> If I build OVMF without -D SMM_REQUIRE, this line of code never runs, 
> practically.
> 
> If I build OVMF with -D SMM_REQUIRE (and, remember, with my SMI broadcast 
> patches in place), then every time the APs are dragged into SMM and *return* 
> from SMM, they run one iteration of this loop -- they wake up, print the Q 
> character, then go back to sleep. Not so bad, right? It just proves that an 
> SMI wakes a sleeping processor, and that the RSM needs to return to somewhere.
> 
> Now please turn your attention to:
> - the ExitBootServicesCallback() function in the same file
> - the commit message for that function (git 9840b129 / SVN r16397)
> - and the discussion from last November that led to it:
>   http://thread.gmane.org/gmane.comp.bios.tianocore.devel/11244
> 
> The ExitBootServicesCallback() function that we have now -- which sends an 
> INIT IPI to the APs, so they remain dormant until a (double) SIPI -- is good 
> enough *assuming* that nothing wakes those APs between
> ExitBootServices() and the time the runtime OS starts them up for good.
> 
> Unfortunately, this assumption is no longer valid, with the broadcast SMI 
> idea: the variable services can be (and are) called in said interval, and 
> they wake the dormant APs. The in-SMM stuff works just fine, but the APs have 
> nowhere to return to, when they see the RSM. They weren't *running* when they 
> got the SMI!
> 
> So that's why I see the APs uncontrollably rebooting under [1] above.
> 
> There are two options:
> 
> (1) I'm noping out of the broadcast SMI idea faster than you can say
>     "fandango on core" -- writing to APM_CNT will raise the SMI only on
>     the current processor.
> 
>     This means keeping Paolo's Relaxed Sync Mode patch. It also implies
>     a huge performance penalty for variable services that are executed
>     on APs -- on both TCG and KVM.
> 
>     Here's a comparison on KVM:
> 
>> [root@ovmf-fedora-q35 ~]# time taskset -c 0 efibootmgr
>> BootCurrent: 0001
>> Timeout: 0 seconds
>> BootOrder: 0001,0000,0003
>> Boot0000* EFI SCSI Device
>> Boot0001* Fedora
>> Boot0003* EFI Internal Shell
>>
>> real    0m0.006s
>> user    0m0.000s
>> sys     0m0.005s <-------
> 
> versus
> 
>> [root@ovmf-fedora-q35 ~]# time taskset -c 1 efibootmgr
>> BootCurrent: 0001
>> Timeout: 0 seconds
>> BootOrder: 0001,0000,0003
>> Boot0000* EFI SCSI Device
>> Boot0001* Fedora
>> Boot0003* EFI Internal Shell
>>
>> real    0m36.013s
>> user    0m1.637s
>> sys     0m34.374s <------- 3-4 orders of magnitude slower
> 
> 
> (2) The other option is keeping the broadcast SMI idea (the actual way
>     to configure that in QEMU remains a topic of discussion), but then
>     the ExitBootServices() callback in CpuDxe will have to implement
>     Jeff's idea from
> 
> http://thread.gmane.org/gmane.comp.bios.tianocore.devel/11244/focus=11247
> 
>     That is, make all APs turn off paging, and execute a halt loop.
> 
>     (So the RSM instruction from the SMI handler has a place to return
>     to, until the runtime OS takes control of those APs.
> 
>     Also, I think the ExitBootServices() callback shouldn't return
>     until it is sure that all APs are safely off of their original code
>     and stacks, so some synchronization will be necessary too...)
> 
> I'm sorry but I don't know enough to work on mode switches, so I can't 
> volunteer for (2). Can someone else please?
> 
> Thanks
> Laszlo
> 

_______________________________________________
edk2-devel mailing list
[email protected]
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to