On 10/26/15 22:59, Kinney, Michael D wrote:
> Laszlo,
> 
> If an AP is in Wait-for-SIPI state, then the AP should ignore the
> SMI.  One description of this is in Intel(R) 64 and IA-32
> Architectures Software Developer's Manual Volume 3C, Section 34.2.

Thanks for the reference -- upon seeing Jeff's commit message, I did
think that perhaps this had not been implemented in QEMU completely
faithfully to physical hardware.

I sought to spell out both -- differently wrong -- behaviors that are
caused by the current exit boot services callback: the spurious wakeup
on QEMU (which might not be fully faithful to phys hw in its own right),
that has nowhere to return on RSM, and the potential loss of wakeups on
phys hw. (Which was described by Jeff in his commit message.)

> If QEMU could evaluate the AP state and not send an SMI to an AP in
> Wait-forSIPI, then updating SMIs to broadcast to all AP should work
> for SeaBios and OVMF.  All APs in SeaBios are in Wait-for-SIPI, so
> the BSP will be the only CPU that receives the SMI.  In OVMF, all APs
> can be in the HLT loop that Jeff Fan provided a patch for (and I see
> you have a good update for), so the SMI can be delivered to BSP and
> all APs.

I'd like to run this idea by Paolo (CC'd). Theoretically I might be able
to respin my QEMU patch so that it looks at the APs' states right when
one of the VCPUs writes to APM_CNT. But, as far as I recall, in QEMU it
is two separate actions to generate / make an interrupt pending, and
then to deliver it. Filtering out the SMI at generation time might not
be the right thing to do. Doing it in the delivery / injection code
seems possible, but that code is super quirky and I don't dare touch it.
(Worse, I think that code might exist separately for TCG (in QEMU) and
KVM (in the host kernel).)

Also, I'm not sure that simply not sending the SMI when the AP is in
wait-or-SIPI would be right. The section you referenced has a note like
this:

  [...] If a SMI is received while an application processor is in the
  wait for SIPI mode, the SMI will be pended. The processor then
  responds on receipt of a SIPI by immediately servicing the pended SMI
  and going into SMM before handling the SIPI.

This might not play so well with SeaBIOS -- SeaBIOS will probably not
care about the pending SMI, but the OS after it might. I'm not sure what
an OS is expected to do to boot an AP.

Is INIT-SIPI-SIPI a requirement (emphasis on "INIT") for the OS? And
would that INIT clear the pending SMI "inherited" from SeaBIOS?

I wish we could do something simpler here... I still think that a QEMU
command line option would be safest. The configuration couldn't be
influenced by the guest, and responsibility would belong with higher
level components (libvirt, or human command line user). SMM is pretty
specialized already, so keeping the current behavior as default, and
requiring another QEMU option for broadcasting SMIs should be acceptable.

Thanks!
Laszlo

> 
> Best regards,
> 
> Mike
> 
>> -----Original Message-----
>> From: Laszlo Ersek [mailto:[email protected]]
>> Sent: Monday, October 26, 2015 10:37 AM
>> To: Fan, Jeff; [email protected]
>> Cc: Kinney, Michael D
>> Subject: Re: [Patch 3/3] UefiCpuPkg/CpuDxe: Place APs into protected mode
>> when ExitBootService
>>
>> On 10/26/15 14:47, Jeff Fan wrote:
>>> In original ExitBootService callback function, we will send INIT IPI to all
>>> APs and place APs into wait-for-sipi state.
>>
>> Yes.
>>
>>> However, it prevents APs to
>>> response any SMI. As a result, SMM BSP needs to do a specified timeout
>> delay
>>> on each SMI.
>>
>> These two sentences are not precise. The problem is not that SMI is not
>> delivered  to those APs that are in wait-for-sipi.
>>
>> Instead, the problem is that the APs wake from the wait-for-sipi state,
>> execute the SMI handler, and then the RSM makes them return into
>> no-man's land.
>>
>> Perhaps this is not how physical hardware works, but this is what the
>> symptoms I'm seeing on TCG imply.
>>
>> So is *that* our problem perhaps? Namely, should the APs that wait for
>> SIPI simply not respond to the SMI?
>>
>>> This fix is to place APs into hlt-loop state to let APs response SMI 
>>> correctly.
>>
>> ... In our case: to allow the APs to return "somewhere" upon executing RSM.
>>
>> More comments below:
>>
>>> Moreover, we change AP mode into protected mode on x64 image. It could
>> avoid
>>> access page table when AP runs hlt-loop code, because page table is located
>> in
>>> boot service data range may re-used by OS.
>>>
>>> Contributed-under: TianoCore Contribution Agreement 1.0
>>> Signed-off-by: Jeff Fan <[email protected]>
>>> CC: Michael Kinney <[email protected]>
>>> CC: Laszlo Ersek <[email protected]>
>>> ---
>>>  UefiCpuPkg/CpuDxe/CpuDxe.inf       |  2 ++
>>>  UefiCpuPkg/CpuDxe/CpuMp.c          | 19 ++++++++++++++---
>>>  UefiCpuPkg/CpuDxe/CpuMp.h          | 12 +++++++++++
>>>  UefiCpuPkg/CpuDxe/Ia32/ArchFuncs.c | 37
>> +++++++++++++++++++++++++++++++++
>>>  UefiCpuPkg/CpuDxe/X64/ArchFuncs.c  | 42
>> ++++++++++++++++++++++++++++++++++++++
>>>  5 files changed, 109 insertions(+), 3 deletions(-)
>>>  create mode 100644 UefiCpuPkg/CpuDxe/Ia32/ArchFuncs.c
>>>  create mode 100644 UefiCpuPkg/CpuDxe/X64/ArchFuncs.c
>>>
>>> diff --git a/UefiCpuPkg/CpuDxe/CpuDxe.inf
>> b/UefiCpuPkg/CpuDxe/CpuDxe.inf
>>> index 9db5303..8ea4c38 100644
>>> --- a/UefiCpuPkg/CpuDxe/CpuDxe.inf
>>> +++ b/UefiCpuPkg/CpuDxe/CpuDxe.inf
>>> @@ -62,6 +62,7 @@
>>>    Ia32/MpAsm.asm  | MSFT
>>>    Ia32/MpAsm.asm  | INTEL
>>>    Ia32/MpAsm.nasm | GCC
>>> +  Ia32/ArchFuncs.c
>>>
>>>  [Sources.X64]
>>>    X64/CpuAsm.asm | MSFT
>>> @@ -70,6 +71,7 @@
>>>    X64/MpAsm.asm  | MSFT
>>>    X64/MpAsm.asm  | INTEL
>>>    X64/MpAsm.nasm | GCC
>>> +  X64/ArchFuncs.c
>>>
>>>  [Protocols]
>>>    gEfiCpuArchProtocolGuid                       ## PRODUCES
>>> diff --git a/UefiCpuPkg/CpuDxe/CpuMp.c b/UefiCpuPkg/CpuDxe/CpuMp.c
>>> index 04c2f1f..883b5d7 100644
>>> --- a/UefiCpuPkg/CpuDxe/CpuMp.c
>>> +++ b/UefiCpuPkg/CpuDxe/CpuMp.c
>>> @@ -30,6 +30,8 @@ VOID *mApStackStart = 0;
>>>  volatile BOOLEAN mAPsAlreadyInitFinished = FALSE;
>>>  volatile BOOLEAN mStopCheckAllAPsStatus = TRUE;
>>>
>>> +volatile STARTUP_CODE *StartupCode;
>>> +
>>
>> This should be "extern".
>>
>> Normally both GCC and MSVC should reject linking this object file
>> together with ApStartup.obj. Namely:
>>
>> - ApStartup.c has
>>
>>  volatile STARTUP_CODE *StartupCode = NULL;
>>
>>  which, according to the C standard, is an "external definition".
>>
>> - CpuMp.c has the above. According to the C standard, that qualifies as
>>  a "tentative definition". Given this specific translation unit (i.e.,
>>  CpuMp.c), the definition is actually an external definition too,
>>  according to the C standard.
>>
>> Therefore both MSVC and GNU binutils are incorrect to link these two
>> object files together; they should be rejected.
>>
>> As a side point, I checked CpuMp.obj with "nm", both without and with
>> "extern". In the former case, the type of the symbol is "C" (common),
>> otherwise it is "U" (undefined). The practical end result is the same
>> (because "C" -- common -- behaves identically to "U" given the external
>> definition in ApStartup.c). Nonetheless, this is a bug
>> (standards-non-conformance) in both toolchains.
>>
>> But I digress.
>>
>>>  EFI_MP_SERVICES_PROTOCOL  mMpServicesTemplate = {
>>>    GetNumberOfProcessors,
>>>    GetProcessorInfo,
>>> @@ -1659,11 +1661,22 @@ ExitBootServicesCallback (
>>>    IN VOID                     *Context
>>>    )
>>>  {
>>> +  EFI_STATUS                 Status;
>>> +
>>>    //
>>> -  // Avoid APs access invalid buff datas which allocated by BootServices,
>>> -  // so we send INIT IPI to APs to let them wait for SIPI state.
>>> +  // Place AP in procted mode to run hlt-loop in ACPI NVS range.
>>> +  // It avoids OS crashed page table and this driver in boot service range.
>>>    //
>>> -  SendInitIpiAllExcludingSelf ();
>>> +  Status = mMpServicesTemplate.StartupAllAPs (
>>> +                        &mMpServicesTemplate,
>>> +                        ChangeApMode,
>>> +                        FALSE,
>>> +                        NULL,
>>> +                        0,
>>> +                        (VOID *) &StartupCode->CpuApRunHltLoopCode,
>>> +                        NULL
>>> +                        );
>>> +  ASSERT_EFI_ERROR (Status);
>>>  }
>>
>> * So, first of all I wanted to regression-test this change, so I built
>> OVMF without -D SMM_REQUIRE, and booted another X64 Fedora guest of
>> mine
>> with it, using QEMU machine type "pc" (i440fx) and KVM.
>>
>> Unfortunately, this change crashes the boot process with a triple fault,
>> according to the KVM trace. (I tested both with and without "extern".)
>>
>>  kvm_exit:             reason TRIPLE_FAULT rip 0x9f01f info 0 0
>>  ...
>>  kvm_userspace_exit:   reason KVM_EXIT_SHUTDOWN (8)
>>
>> * Second, I didn't think of using mMpServicesTemplate.StartupAllAPs()
>> for this purpose.
>>
>> I thought we'd use SendInitSipiSipiAllExcludingSelf(). But,
>> StartupAllAPs() seems reasonable I guess.
>>
>> Two points however:
>>
>> - I think we should wait in the ExitBootServicesCallback() function
>> until we are sure that the APs are running the HLT loop. Otherwise we
>> could return while some APs are still in our own idle loop (which is
>> "boot services code" type memory). This would probably require some
>> synchronization.
>>
>> - If there is only one processor (ie. no AP, just BSP), then Status will
>> be EFI_NOT_STARTED -- that should be accepted too:
>>
>>  ASSERT (Status == EFI_SUCCESS || Status == EFI_NOT_STARTED);
>>
>> More comments below:
>>
>>>
>>>  /**
>>> diff --git a/UefiCpuPkg/CpuDxe/CpuMp.h b/UefiCpuPkg/CpuDxe/CpuMp.h
>>> index 907e993..3040b2e 100644
>>> --- a/UefiCpuPkg/CpuDxe/CpuMp.h
>>> +++ b/UefiCpuPkg/CpuDxe/CpuMp.h
>>> @@ -21,6 +21,7 @@
>>>  #include <Library/SynchronizationLib.h>
>>>  #include <Library/HobLib.h>
>>>  #include <Library/ReportStatusCodeLib.h>
>>> +#include <Library/DebugLib.h>
>>>
>>>  #pragma pack(1)
>>>
>>> @@ -763,5 +764,16 @@ SetMtrrsFromBuffer (
>>>    IN VOID *Buffer
>>>    );
>>>
>>> +/**
>>> +  This function will place AP to Hlt-Loop state.
>>> +
>>> +  @param  Buffer  Pointer to Hlt-Loop entry point.
>>> +**/
>>> +VOID
>>> +EFIAPI
>>> +ChangeApMode (
>>> +  IN  VOID     *Buffer
>>> +  );
>>> +
>>>  #endif // _CPU_MP_H_
>>>
>>> diff --git a/UefiCpuPkg/CpuDxe/Ia32/ArchFuncs.c
>> b/UefiCpuPkg/CpuDxe/Ia32/ArchFuncs.c
>>> new file mode 100644
>>> index 0000000..32bfab5
>>> --- /dev/null
>>> +++ b/UefiCpuPkg/CpuDxe/Ia32/ArchFuncs.c
>>> @@ -0,0 +1,37 @@
>>> +/** @file
>>> +  Arch specific functions.
>>> +
>>> +  Copyright (c) 2015, Intel Corporation. All rights reserved.<BR>
>>> +  This program and the accompanying materials
>>> +  are licensed and made available under the terms and conditions of the
>> BSD License
>>> +  which accompanies this distribution.  The full text of the license may be
>> found at
>>> +  http://opensource.org/licenses/bsd-license.php
>>> +
>>> +  THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS"
>> BASIS,
>>> +  WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER
>> EXPRESS OR IMPLIED.
>>> +
>>> +**/
>>> +
>>> +#include <CpuMp.h>
>>> +
>>> +/**
>>> +  This function will place AP to Hlt-Loop state.
>>> +
>>> +  @param  Buffer  Pointer to Hlt-Loop entry point.
>>> +**/
>>> +VOID
>>> +EFIAPI
>>> +ChangeApMode (
>>> +  IN  VOID     *Buffer
>>> +  )
>>> +{
>>> +  STACKLESS_AP_ENTRY_POINT     ApHltLoopFunc;
>>> +
>>> +  ApHltLoopFunc = (STACKLESS_AP_ENTRY_POINT) (UINTN) Buffer;
>>> +  (*ApHltLoopFunc) ();
>>
>> Shouldn't we use SwitchStack() instead?
>>
>> I'm only asking because I recall that S3ResumePei uses SwitchStack() for
>> non-local control transfer, when it doesn't need to switch CPU modes.
>>
>>> +
>>> +  //
>>> +  // If should never reach here
>>
>> "It"
>>
>>> +  //
>>> +  ASSERT (FALSE);
>>> +}
>>> diff --git a/UefiCpuPkg/CpuDxe/X64/ArchFuncs.c
>> b/UefiCpuPkg/CpuDxe/X64/ArchFuncs.c
>>> new file mode 100644
>>> index 0000000..4117f0a
>>> --- /dev/null
>>> +++ b/UefiCpuPkg/CpuDxe/X64/ArchFuncs.c
>>> @@ -0,0 +1,42 @@
>>> +/** @file
>>> +  Arch specific functions.
>>> +
>>> +  Copyright (c) 2015, Intel Corporation. All rights reserved.<BR>
>>> +  This program and the accompanying materials
>>> +  are licensed and made available under the terms and conditions of the
>> BSD License
>>> +  which accompanies this distribution.  The full text of the license may be
>> found at
>>> +  http://opensource.org/licenses/bsd-license.php
>>> +
>>> +  THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS"
>> BASIS,
>>> +  WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER
>> EXPRESS OR IMPLIED.
>>> +
>>> +**/
>>> +
>>> +#include <CpuMp.h>
>>> +#include <CpuGdt.h>
>>> +
>>> +/**
>>> +  This function will place AP to Hlt-Loop state.
>>> +
>>> +  @param  Buffer  Pointer to Hlt-Loop entry point.
>>> +**/
>>> +VOID
>>> +EFIAPI
>>> +ChangeApMode (
>>> +  IN  VOID     *Buffer
>>> +  )
>>> +{
>>> +  EFI_STATUS   Status;
>>> +
>>> +  AsmDisablePaging64 (
>>> +    SYS_CODE_SEL,
>>> +    (UINT32) (UINTN) Buffer,
>>> +    0,
>>> +    0,
>>> +    (UINT32) (UINTN) &Status
>>> +    );
>>
>> (I certainly thought of using AsmDisablePaging64() for this, but I
>> couldn't figure out what to pass as "Cs". I can see SYS_CODE_SEL in
>> "UefiCpuPkg/CpuDxe/CpuGdt.h"... I would have never found that.)
>>
>> I have one comment here (although it might not matter much): if this
>> module is compiled for X64 DXE, I don't think it is guaranteed that the
>> address of the current stack (i.e., &Status) can be expressed in 32-bits.
>>
>> Shouldn't we allocate a small array in STARTUP_CODE, to be used as stack
>> here?
>>
>>> +  //
>>> +  // If should never reach here
>>
>> "It"
>>
>>> +  //
>>> +  ASSERT (FALSE);
>>> +}
>>>
>>
>> In any case, the biggest problem for me is that this doesn't seem to
>> work :( Soon after I hit Enter in GRUB, the guest crashes with a triple
>> fault.
>>
>> Here's an idea: how about using SendInitSipiSipiAllExcludingSelf()?
>> - no stack is necessary to set up (the HLT loop needs no stack)
>>
>> - no mode switching needed, code runs immediately in real mode
>>
>> - SendInitSipiSipiAllExcludingSelf() claims to return only after the
>>  IPI has been accepted by the target processors, so no need to wait /
>>  synchronize in the exit boot services callback until the APs leave
>>  "our" code (the idle loop).
>>
>> - It would take an independent, 4KB AcpiNVS allocation, independently
>>  of STARTUP_CODE, but that's easy to do.
>>
>> - An INIT-SIPI-SIPI sequence should also safely abort whatever job the
>>  APs were doing at the time of ExitBootServices().
>>
>> What do you think?
>>
>> Thanks!
>> Laszlo

_______________________________________________
edk2-devel mailing list
[email protected]
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to