Laszlo,

I saw your simple patch to handle this case. Thanks! I will review/feedback it 
later.

Yes. On physical hw, Aps will not response SMI if Aps received SMI in WFSI 
state. But Aps will have one pending SMI and will enter into SMM once Aps 
receive Startup IPI.

Form section "Local APIC State After an INIT Reset" as below, the pending SMI 
should be cleared. But I do not verified it.
"Upon receiving an INIT through either of these mechanisms, the processor 
responds by beginning the initialization
process of the processor core and the local APIC."

Thanks!
Jeff

-----Original Message-----
From: Laszlo Ersek [mailto:ler...@redhat.com] 
Sent: Tuesday, October 27, 2015 6:31 AM
To: Kinney, Michael D
Cc: Fan, Jeff; edk2-de...@ml01.01.org; Paolo Bonzini
Subject: Re: [Patch 3/3] UefiCpuPkg/CpuDxe: Place APs into protected mode when 
ExitBootService

On 10/26/15 22:59, Kinney, Michael D wrote:
> Laszlo,
> 
> If an AP is in Wait-for-SIPI state, then the AP should ignore the SMI.  
> One description of this is in Intel(R) 64 and IA-32 Architectures 
> Software Developer's Manual Volume 3C, Section 34.2.

Thanks for the reference -- upon seeing Jeff's commit message, I did think that 
perhaps this had not been implemented in QEMU completely faithfully to physical 
hardware.

I sought to spell out both -- differently wrong -- behaviors that are caused by 
the current exit boot services callback: the spurious wakeup on QEMU (which 
might not be fully faithful to phys hw in its own right), that has nowhere to 
return on RSM, and the potential loss of wakeups on phys hw. (Which was 
described by Jeff in his commit message.)

> If QEMU could evaluate the AP state and not send an SMI to an AP in 
> Wait-forSIPI, then updating SMIs to broadcast to all AP should work 
> for SeaBios and OVMF.  All APs in SeaBios are in Wait-for-SIPI, so the 
> BSP will be the only CPU that receives the SMI.  In OVMF, all APs can 
> be in the HLT loop that Jeff Fan provided a patch for (and I see you 
> have a good update for), so the SMI can be delivered to BSP and all 
> APs.

I'd like to run this idea by Paolo (CC'd). Theoretically I might be able to 
respin my QEMU patch so that it looks at the APs' states right when one of the 
VCPUs writes to APM_CNT. But, as far as I recall, in QEMU it is two separate 
actions to generate / make an interrupt pending, and then to deliver it. 
Filtering out the SMI at generation time might not be the right thing to do. 
Doing it in the delivery / injection code seems possible, but that code is 
super quirky and I don't dare touch it.
(Worse, I think that code might exist separately for TCG (in QEMU) and KVM (in 
the host kernel).)

Also, I'm not sure that simply not sending the SMI when the AP is in 
wait-or-SIPI would be right. The section you referenced has a note like
this:

  [...] If a SMI is received while an application processor is in the
  wait for SIPI mode, the SMI will be pended. The processor then
  responds on receipt of a SIPI by immediately servicing the pended SMI
  and going into SMM before handling the SIPI.

This might not play so well with SeaBIOS -- SeaBIOS will probably not care 
about the pending SMI, but the OS after it might. I'm not sure what an OS is 
expected to do to boot an AP.

Is INIT-SIPI-SIPI a requirement (emphasis on "INIT") for the OS? And would that 
INIT clear the pending SMI "inherited" from SeaBIOS?

I wish we could do something simpler here... I still think that a QEMU command 
line option would be safest. The configuration couldn't be influenced by the 
guest, and responsibility would belong with higher level components (libvirt, 
or human command line user). SMM is pretty specialized already, so keeping the 
current behavior as default, and requiring another QEMU option for broadcasting 
SMIs should be acceptable.

Thanks!
Laszlo

> 
> Best regards,
> 
> Mike
> 
>> -----Original Message-----
>> From: Laszlo Ersek [mailto:ler...@redhat.com]
>> Sent: Monday, October 26, 2015 10:37 AM
>> To: Fan, Jeff; edk2-de...@ml01.01.org
>> Cc: Kinney, Michael D
>> Subject: Re: [Patch 3/3] UefiCpuPkg/CpuDxe: Place APs into protected 
>> mode when ExitBootService
>>
>> On 10/26/15 14:47, Jeff Fan wrote:
>>> In original ExitBootService callback function, we will send INIT IPI 
>>> to all APs and place APs into wait-for-sipi state.
>>
>> Yes.
>>
>>> However, it prevents APs to
>>> response any SMI. As a result, SMM BSP needs to do a specified 
>>> timeout
>> delay
>>> on each SMI.
>>
>> These two sentences are not precise. The problem is not that SMI is 
>> not delivered  to those APs that are in wait-for-sipi.
>>
>> Instead, the problem is that the APs wake from the wait-for-sipi 
>> state, execute the SMI handler, and then the RSM makes them return 
>> into no-man's land.
>>
>> Perhaps this is not how physical hardware works, but this is what the 
>> symptoms I'm seeing on TCG imply.
>>
>> So is *that* our problem perhaps? Namely, should the APs that wait 
>> for SIPI simply not respond to the SMI?
>>
>>> This fix is to place APs into hlt-loop state to let APs response SMI 
>>> correctly.
>>
>> ... In our case: to allow the APs to return "somewhere" upon executing RSM.
>>
>> More comments below:
>>
>>> Moreover, we change AP mode into protected mode on x64 image. It 
>>> could
>> avoid
>>> access page table when AP runs hlt-loop code, because page table is 
>>> located
>> in
>>> boot service data range may re-used by OS.
>>>
>>> Contributed-under: TianoCore Contribution Agreement 1.0
>>> Signed-off-by: Jeff Fan <jeff....@intel.com>
>>> CC: Michael Kinney <michael.d.kin...@intel.com>
>>> CC: Laszlo Ersek <ler...@redhat.com>
>>> ---
>>>  UefiCpuPkg/CpuDxe/CpuDxe.inf       |  2 ++
>>>  UefiCpuPkg/CpuDxe/CpuMp.c          | 19 ++++++++++++++---
>>>  UefiCpuPkg/CpuDxe/CpuMp.h          | 12 +++++++++++
>>>  UefiCpuPkg/CpuDxe/Ia32/ArchFuncs.c | 37
>> +++++++++++++++++++++++++++++++++
>>>  UefiCpuPkg/CpuDxe/X64/ArchFuncs.c  | 42
>> ++++++++++++++++++++++++++++++++++++++
>>>  5 files changed, 109 insertions(+), 3 deletions(-)  create mode 
>>> 100644 UefiCpuPkg/CpuDxe/Ia32/ArchFuncs.c
>>>  create mode 100644 UefiCpuPkg/CpuDxe/X64/ArchFuncs.c
>>>
>>> diff --git a/UefiCpuPkg/CpuDxe/CpuDxe.inf
>> b/UefiCpuPkg/CpuDxe/CpuDxe.inf
>>> index 9db5303..8ea4c38 100644
>>> --- a/UefiCpuPkg/CpuDxe/CpuDxe.inf
>>> +++ b/UefiCpuPkg/CpuDxe/CpuDxe.inf
>>> @@ -62,6 +62,7 @@
>>>    Ia32/MpAsm.asm  | MSFT
>>>    Ia32/MpAsm.asm  | INTEL
>>>    Ia32/MpAsm.nasm | GCC
>>> +  Ia32/ArchFuncs.c
>>>
>>>  [Sources.X64]
>>>    X64/CpuAsm.asm | MSFT
>>> @@ -70,6 +71,7 @@
>>>    X64/MpAsm.asm  | MSFT
>>>    X64/MpAsm.asm  | INTEL
>>>    X64/MpAsm.nasm | GCC
>>> +  X64/ArchFuncs.c
>>>
>>>  [Protocols]
>>>    gEfiCpuArchProtocolGuid                       ## PRODUCES
>>> diff --git a/UefiCpuPkg/CpuDxe/CpuMp.c b/UefiCpuPkg/CpuDxe/CpuMp.c 
>>> index 04c2f1f..883b5d7 100644
>>> --- a/UefiCpuPkg/CpuDxe/CpuMp.c
>>> +++ b/UefiCpuPkg/CpuDxe/CpuMp.c
>>> @@ -30,6 +30,8 @@ VOID *mApStackStart = 0;  volatile BOOLEAN 
>>> mAPsAlreadyInitFinished = FALSE;  volatile BOOLEAN 
>>> mStopCheckAllAPsStatus = TRUE;
>>>
>>> +volatile STARTUP_CODE *StartupCode;
>>> +
>>
>> This should be "extern".
>>
>> Normally both GCC and MSVC should reject linking this object file 
>> together with ApStartup.obj. Namely:
>>
>> - ApStartup.c has
>>
>>  volatile STARTUP_CODE *StartupCode = NULL;
>>
>>  which, according to the C standard, is an "external definition".
>>
>> - CpuMp.c has the above. According to the C standard, that qualifies 
>> as  a "tentative definition". Given this specific translation unit 
>> (i.e.,  CpuMp.c), the definition is actually an external definition 
>> too,  according to the C standard.
>>
>> Therefore both MSVC and GNU binutils are incorrect to link these two 
>> object files together; they should be rejected.
>>
>> As a side point, I checked CpuMp.obj with "nm", both without and with 
>> "extern". In the former case, the type of the symbol is "C" (common), 
>> otherwise it is "U" (undefined). The practical end result is the same 
>> (because "C" -- common -- behaves identically to "U" given the 
>> external definition in ApStartup.c). Nonetheless, this is a bug
>> (standards-non-conformance) in both toolchains.
>>
>> But I digress.
>>
>>>  EFI_MP_SERVICES_PROTOCOL  mMpServicesTemplate = {
>>>    GetNumberOfProcessors,
>>>    GetProcessorInfo,
>>> @@ -1659,11 +1661,22 @@ ExitBootServicesCallback (
>>>    IN VOID                     *Context
>>>    )
>>>  {
>>> +  EFI_STATUS                 Status;
>>> +
>>>    //
>>> -  // Avoid APs access invalid buff datas which allocated by 
>>> BootServices,
>>> -  // so we send INIT IPI to APs to let them wait for SIPI state.
>>> +  // Place AP in procted mode to run hlt-loop in ACPI NVS range.
>>> +  // It avoids OS crashed page table and this driver in boot service range.
>>>    //
>>> -  SendInitIpiAllExcludingSelf ();
>>> +  Status = mMpServicesTemplate.StartupAllAPs (
>>> +                        &mMpServicesTemplate,
>>> +                        ChangeApMode,
>>> +                        FALSE,
>>> +                        NULL,
>>> +                        0,
>>> +                        (VOID *) &StartupCode->CpuApRunHltLoopCode,
>>> +                        NULL
>>> +                        );
>>> +  ASSERT_EFI_ERROR (Status);
>>>  }
>>
>> * So, first of all I wanted to regression-test this change, so I 
>> built OVMF without -D SMM_REQUIRE, and booted another X64 Fedora 
>> guest of mine with it, using QEMU machine type "pc" (i440fx) and KVM.
>>
>> Unfortunately, this change crashes the boot process with a triple 
>> fault, according to the KVM trace. (I tested both with and without 
>> "extern".)
>>
>>  kvm_exit:             reason TRIPLE_FAULT rip 0x9f01f info 0 0
>>  ...
>>  kvm_userspace_exit:   reason KVM_EXIT_SHUTDOWN (8)
>>
>> * Second, I didn't think of using mMpServicesTemplate.StartupAllAPs()
>> for this purpose.
>>
>> I thought we'd use SendInitSipiSipiAllExcludingSelf(). But,
>> StartupAllAPs() seems reasonable I guess.
>>
>> Two points however:
>>
>> - I think we should wait in the ExitBootServicesCallback() function 
>> until we are sure that the APs are running the HLT loop. Otherwise we 
>> could return while some APs are still in our own idle loop (which is 
>> "boot services code" type memory). This would probably require some 
>> synchronization.
>>
>> - If there is only one processor (ie. no AP, just BSP), then Status 
>> will be EFI_NOT_STARTED -- that should be accepted too:
>>
>>  ASSERT (Status == EFI_SUCCESS || Status == EFI_NOT_STARTED);
>>
>> More comments below:
>>
>>>
>>>  /**
>>> diff --git a/UefiCpuPkg/CpuDxe/CpuMp.h b/UefiCpuPkg/CpuDxe/CpuMp.h 
>>> index 907e993..3040b2e 100644
>>> --- a/UefiCpuPkg/CpuDxe/CpuMp.h
>>> +++ b/UefiCpuPkg/CpuDxe/CpuMp.h
>>> @@ -21,6 +21,7 @@
>>>  #include <Library/SynchronizationLib.h>  #include 
>>> <Library/HobLib.h>  #include <Library/ReportStatusCodeLib.h>
>>> +#include <Library/DebugLib.h>
>>>
>>>  #pragma pack(1)
>>>
>>> @@ -763,5 +764,16 @@ SetMtrrsFromBuffer (
>>>    IN VOID *Buffer
>>>    );
>>>
>>> +/**
>>> +  This function will place AP to Hlt-Loop state.
>>> +
>>> +  @param  Buffer  Pointer to Hlt-Loop entry point.
>>> +**/
>>> +VOID
>>> +EFIAPI
>>> +ChangeApMode (
>>> +  IN  VOID     *Buffer
>>> +  );
>>> +
>>>  #endif // _CPU_MP_H_
>>>
>>> diff --git a/UefiCpuPkg/CpuDxe/Ia32/ArchFuncs.c
>> b/UefiCpuPkg/CpuDxe/Ia32/ArchFuncs.c
>>> new file mode 100644
>>> index 0000000..32bfab5
>>> --- /dev/null
>>> +++ b/UefiCpuPkg/CpuDxe/Ia32/ArchFuncs.c
>>> @@ -0,0 +1,37 @@
>>> +/** @file
>>> +  Arch specific functions.
>>> +
>>> +  Copyright (c) 2015, Intel Corporation. All rights reserved.<BR>  
>>> + This program and the accompanying materials  are licensed and made 
>>> + available under the terms and conditions of the
>> BSD License
>>> +  which accompanies this distribution.  The full text of the 
>>> + license may be
>> found at
>>> +  http://opensource.org/licenses/bsd-license.php
>>> +
>>> +  THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS"
>> BASIS,
>>> +  WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER
>> EXPRESS OR IMPLIED.
>>> +
>>> +**/
>>> +
>>> +#include <CpuMp.h>
>>> +
>>> +/**
>>> +  This function will place AP to Hlt-Loop state.
>>> +
>>> +  @param  Buffer  Pointer to Hlt-Loop entry point.
>>> +**/
>>> +VOID
>>> +EFIAPI
>>> +ChangeApMode (
>>> +  IN  VOID     *Buffer
>>> +  )
>>> +{
>>> +  STACKLESS_AP_ENTRY_POINT     ApHltLoopFunc;
>>> +
>>> +  ApHltLoopFunc = (STACKLESS_AP_ENTRY_POINT) (UINTN) Buffer;
>>> +  (*ApHltLoopFunc) ();
>>
>> Shouldn't we use SwitchStack() instead?
>>
>> I'm only asking because I recall that S3ResumePei uses SwitchStack() 
>> for non-local control transfer, when it doesn't need to switch CPU modes.
>>
>>> +
>>> +  //
>>> +  // If should never reach here
>>
>> "It"
>>
>>> +  //
>>> +  ASSERT (FALSE);
>>> +}
>>> diff --git a/UefiCpuPkg/CpuDxe/X64/ArchFuncs.c
>> b/UefiCpuPkg/CpuDxe/X64/ArchFuncs.c
>>> new file mode 100644
>>> index 0000000..4117f0a
>>> --- /dev/null
>>> +++ b/UefiCpuPkg/CpuDxe/X64/ArchFuncs.c
>>> @@ -0,0 +1,42 @@
>>> +/** @file
>>> +  Arch specific functions.
>>> +
>>> +  Copyright (c) 2015, Intel Corporation. All rights reserved.<BR>  
>>> + This program and the accompanying materials  are licensed and made 
>>> + available under the terms and conditions of the
>> BSD License
>>> +  which accompanies this distribution.  The full text of the 
>>> + license may be
>> found at
>>> +  http://opensource.org/licenses/bsd-license.php
>>> +
>>> +  THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS"
>> BASIS,
>>> +  WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER
>> EXPRESS OR IMPLIED.
>>> +
>>> +**/
>>> +
>>> +#include <CpuMp.h>
>>> +#include <CpuGdt.h>
>>> +
>>> +/**
>>> +  This function will place AP to Hlt-Loop state.
>>> +
>>> +  @param  Buffer  Pointer to Hlt-Loop entry point.
>>> +**/
>>> +VOID
>>> +EFIAPI
>>> +ChangeApMode (
>>> +  IN  VOID     *Buffer
>>> +  )
>>> +{
>>> +  EFI_STATUS   Status;
>>> +
>>> +  AsmDisablePaging64 (
>>> +    SYS_CODE_SEL,
>>> +    (UINT32) (UINTN) Buffer,
>>> +    0,
>>> +    0,
>>> +    (UINT32) (UINTN) &Status
>>> +    );
>>
>> (I certainly thought of using AsmDisablePaging64() for this, but I 
>> couldn't figure out what to pass as "Cs". I can see SYS_CODE_SEL in 
>> "UefiCpuPkg/CpuDxe/CpuGdt.h"... I would have never found that.)
>>
>> I have one comment here (although it might not matter much): if this 
>> module is compiled for X64 DXE, I don't think it is guaranteed that 
>> the address of the current stack (i.e., &Status) can be expressed in 32-bits.
>>
>> Shouldn't we allocate a small array in STARTUP_CODE, to be used as 
>> stack here?
>>
>>> +  //
>>> +  // If should never reach here
>>
>> "It"
>>
>>> +  //
>>> +  ASSERT (FALSE);
>>> +}
>>>
>>
>> In any case, the biggest problem for me is that this doesn't seem to 
>> work :( Soon after I hit Enter in GRUB, the guest crashes with a 
>> triple fault.
>>
>> Here's an idea: how about using SendInitSipiSipiAllExcludingSelf()?
>> - no stack is necessary to set up (the HLT loop needs no stack)
>>
>> - no mode switching needed, code runs immediately in real mode
>>
>> - SendInitSipiSipiAllExcludingSelf() claims to return only after the  
>> IPI has been accepted by the target processors, so no need to wait /  
>> synchronize in the exit boot services callback until the APs leave  
>> "our" code (the idle loop).
>>
>> - It would take an independent, 4KB AcpiNVS allocation, independently  
>> of STARTUP_CODE, but that's easy to do.
>>
>> - An INIT-SIPI-SIPI sequence should also safely abort whatever job 
>> the  APs were doing at the time of ExitBootServices().
>>
>> What do you think?
>>
>> Thanks!
>> Laszlo

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to