On 10/27/15 23:54, Jordan Justen wrote: > On 2015-10-27 05:20:12, Laszlo Ersek wrote: >> On 10/27/15 07:48, Jordan Justen wrote: >>> What about adding this code to mStartupCodeTemplate and sending the >>> APs there using the MP services? >> >> That was the idea in Jeff's earlier series, but it comes with a number >> of complications. The main one is that the HLT loop is entered as a >> "normal" StartupAllAPs() callback function. That necessitates tearing >> down parts of the environment that the MP services implementation builds >> up carefully at initialization. Other parts of that laborously >> constructed environment are plain irrelevant. > > Hmm. I assumed that starting an AP task should be much lower latency > than running INIT-SIPI-SIPI. Yet, examining the current code, it looks > like we generally call CpuSleep on the AP, and we only currently > support resuming from this state with an INIT-SIPI-SIPI. > > I think the APs could be woken from CPU sleep by an interrupt IPI, > which should be a lot less traumatic than INIT-SIPI-SIPI. This > probably would require a stub interrupt table to be setup for the APs. > > With that in place, we could then send the APs to the cli-hlt loop > much faster than via INIT-SIPI-SIPI. > > (Especially given that apparently this can take up to 1 second?! :)
1 second seems to be an exaggeration (although I do trust that you are quoting it verbatim from a reliable source), especially for a virt platform. But, in any case: even if that interval reflects reality, the hit would be taken once per ExitBootServices(). I think that's pretty tolerable, considering that flash write access (i.e., setting variables) takes forever anyway. > Another reason it could be good to work with the code is that we > probably should put the code into a shutdown mode, so if future MP > service requests come in, the MP services code doesn't try to restart > the APs. (Ie, it should just return an immediate error.) The driver that provides the MP service, CpuDxe, is a DXE_DRIVER, not a runtime one. Its code, including the C-language idle loop for the APs, resides in BootServicesCode type memory. The driver does not exist after ExitBootServices(); no new requests can be made. >> Instead, this solution takes manual mode switching, paging, and stacks >> out of the picture, which is a good thing. The INIT-SIPI-SIPI sequence >> places the AP in real mode immediately, which is perfect for the minimal >> stub that we'd like the APs to run. > > I didn't consider the potential problem with leaving paging enabled, > but if we get the processor back into IA32 mode, we can leave the APs > running anywhere in the low 32-bit address space. It would be nice to > not to leave any memory allocated below 1MB if we can avoid it. I don't have any other counter-argument here than: this will have to be implemented by someone else. I tested Jeff's patches -- they didn't work, unfortunately, and because they involved mode switching / disablement of paging, it was much more feasible for me to write this different patch than to try to find the error in Jeff's patches. So, if someone implements the approach you are suggesting, I can only approve (as long as it will alllow SMM to work in OVMF, on KVM). And, until then, the SMM work is blocked. By the way, the EFI_D_VERBOSE message that I added with this patch prints the address of the page that contains the HLT loop. (I added that message in anticipation of crashes -- I wanted a way to correlate the crash RIP that I expected to find in the KVM trace with the OVMF debug log.) No crash occurred (it was incredible -- apparently I got it right at the first attempt), but now the debug message is proving useful anyway. Namely, most such allocations in edk2 work top-down (they pass AllocateMaxAddress). The address I see printed is: CpuDxe: ExitBootServicesCallback: HLT loop routine at 9E00:0000 which means that there is only this AcpiNVS page, and another at 0x9f000, allocated under 640KB, when we leave boot services. I think that should be tolerable. (The allocation request in this driver is made for any RAM under 1MB, but in QemuInitializeRam() we install RAM only up to 640 KB in that range. Which is why I claim "2 pages is all".) > >>> Regardless of that, I think the C-based-asm would better fit in >>> ApStartup.c. >> >> I can move the code of course, but to me it doesn't seem overly >> justified. The C-based ASM in this question is only used by the >> ExitBootServices() callback, so (as you can see it in the patch) I could >> keep the template STATIC. If I move it to ApStartup.c, I'll have to make >> it extern (or else do the copying outside of the callback). I feel that >> this isn't overly justified, but if you insist, I can do it. > > Ok. This is fine. I am not a fan of the C-base-asm, despite the fact > that I wrote it, and I still think it is better than separate assembly > files. (Having only NASM be used in EDK II might change my mind.) > > I guess I just don't want to see it proliferate too much. :) > >>> Why not just a cli-hlt loop? >> >> Because if the ExitBootServices() callback doesn't wait until all APs >> abort and abandon the MP services idle loop -- or worse, an assigned MP >> services job that is still running -- then the BSP could immediately >> return from ExitBootServices(), and proceed to third party code that: >> - overwrites the page that used to host the idle loop, or >> - raises an SMI (via a variable service call for example) and then the >> APs might return to the old idle loop upon seeing RSM. >> >> The synchronization is in place so that the callback returns knowing >> that all APs have abandoned the old idle loop (or whatever job they were >> running). > > We could verify all APs have set CpuStateBusy. By that time it seems > extremely likely that all APs will be running in the runtime safe > code. I'm not saying you are not right -- only that this is all I can contribute in this area of CpuDxe. The simplicity and robustness of my patch are compelling *for me* -- for the skills I have in such super-low level programming. If you don't like the "design" of my patch, I respect that, but then please, have at it. :) I'm certainly willing to test patches, and to provide feedback if I'm able. (This bug impacts physical platforms as well that pull in SMM, so I don't feel I'm primarily responsible for fixing it. I just tried to help out / expedite the dependent OVMF SMM work by attacking it, on a level where I can still move around with a modicum of safety.) Cheers Laszlo > > -Jordan > >> >>> Maybe if the MP services were used then >>> the separate counter goes away. >> >> I don't want it to go away. :) Plus, as far as I can see in >> StartupAllAPs() [UefiCpuPkg/CpuDxe/CpuMp.c], it doesn't wait for the >> *initiation* of the user procedure in any case. It can wait for the >> completion, but that's not good to us now -- our procedure is an >> infinite loop; it will not complete. All we care about is feedback that >> the old code has been abandoned completely. >> >>> How about 0 rather than 0x1234? >> >> I wanted something that would reflect the little endian encoding in a >> pure hexdump. I don't feel strongly about this though. >> >>> Can you provide a nasm source block in a comment like in ApStartup.c? >> >> Sure. >> >> Please follow up one by one on the points where I disagreed, so I can >> send v2. The two parts that I do feel strongly about are (a) not using >> MP services itself for this, (b) the counter / synchronization. >> >> Thanks >> Laszl >> >>> >>> -Jordan >>> >>> On 2015-10-26 14:33:09, Laszlo Ersek wrote: >>>> Our current callback puts the APs in wait-for-SIPI state, by sending them >>>> an INIT IPI. This works fine as long as the APs are not woken between >>>> ExitBootServices() and the time the runtime OS boots the APs. >>>> >>>> However, if a platform includes PiSmmCpuDxeSmm, and bases the runtime UEFI >>>> services (like the variable services) on SMM and SMIs, then two problem >>>> scenarios can be experienced: >>>> >>>> (a) On virtual platforms, the APs can be (spuriously?) woken by SMIs, >>>> between ExitBootServices() and the runtime OS's AP boot; for example >>>> by variable service calls. In this case the APs have no code to >>>> continue executing after the RSM instruction, which causes them to >>>> re-set. >>>> >>>> (b) On physical platforms, the APs might not wake from the wait-for-SIPI >>>> state at all, causing performance (or functional) problems with >>>> PiSmmCpuDxeSmm, dependent on the BSP/AP sync mode configured therein. >>>> >>>> Implement the following logic instead: >>>> >>>> - If there is no AP, there's no need for an ExitBootServices() callback. >>>> >>>> - Otherwise, reboot all APs with a minimal real-mode stub. Each AP shall >>>> bump a shared counter atomically at first, then spiral into an infinite >>>> HLT loop. This occurs regardless of what each AP has been doing at that >>>> moment. >>>> >>>> - APs can wake from HLT due to an SMI easily. >>>> >>>> - The BSP in the ExitBootServices() callback will wait until all APs have >>>> moved on to the stub. >>>> >>>> - Since we exit boot services time now with the original MP services >>>> startup code abandoned by all APs, we can even allocate that startup >>>> code in Boot Services Code type memory; only the new stub has to be >>>> AcpiNVS. >>>> >>>> Cc: Chen Fan <[email protected]> >>>> Cc: Jeff Fan <[email protected]> >>>> Cc: Jordan Justen <[email protected]> >>>> Cc: Michael Kinney <[email protected]> >>>> Contributed-under: TianoCore Contribution Agreement 1.0 >>>> Signed-off-by: Laszlo Ersek <[email protected]> >>>> --- >>>> >>>> Notes: >>>> So this would be option (2) from >>>> <http://thread.gmane.org/gmane.comp.bios.edk2.devel/3357/focus=3403>. >>>> With this patch in place, the AP reboots I had seen on TCG are gone. >>>> >>>> UefiCpuPkg/CpuDxe/ApStartup.c | 2 +- >>>> UefiCpuPkg/CpuDxe/CpuMp.c | 84 >>>> +++++++++++++++++++++++++++++++++++-------- >>>> 2 files changed, 71 insertions(+), 15 deletions(-) >>>> >>>> diff --git a/UefiCpuPkg/CpuDxe/ApStartup.c b/UefiCpuPkg/CpuDxe/ApStartup.c >>>> index 78fb26f..b8d736f 100644 >>>> --- a/UefiCpuPkg/CpuDxe/ApStartup.c >>>> +++ b/UefiCpuPkg/CpuDxe/ApStartup.c >>>> @@ -382,7 +382,7 @@ PrepareAPStartupCode ( >>>> StartAddress = BASE_1MB; >>>> Status = gBS->AllocatePages ( >>>> AllocateMaxAddress, >>>> - EfiACPIMemoryNVS, >>>> + EfiBootServicesCode, >>>> EFI_SIZE_TO_PAGES (sizeof (*StartupCode)), >>>> &StartAddress >>>> ); >>>> diff --git a/UefiCpuPkg/CpuDxe/CpuMp.c b/UefiCpuPkg/CpuDxe/CpuMp.c >>>> index 04c2f1f..98976a1 100644 >>>> --- a/UefiCpuPkg/CpuDxe/CpuMp.c >>>> +++ b/UefiCpuPkg/CpuDxe/CpuMp.c >>>> @@ -30,6 +30,33 @@ VOID *mApStackStart = 0; >>>> volatile BOOLEAN mAPsAlreadyInitFinished = FALSE; >>>> volatile BOOLEAN mStopCheckAllAPsStatus = TRUE; >>>> >>>> +#pragma pack (1) >>>> +typedef struct { >>>> + UINT8 MovAx; >>>> + UINT16 AxValue; >>>> + UINT8 MovDsAx[2]; >>>> + UINT8 LockedWordCounterIncrement[6]; >>>> + >>>> + UINT8 Cli; >>>> + UINT8 Hlt; >>>> + UINT8 JmpToCli[2]; >>>> + >>>> + UINT16 WordCounter; >>>> +} CPU_AP_HLT_LOOP; >>>> +#pragma pack () >>>> + >>>> +STATIC CONST CPU_AP_HLT_LOOP mCpuApHltLoopTemplate = { >>>> + 0xB8, 0x1234, // mov ax,0x1234 >>>> + { 0x8E, 0xD8 }, // mov ds,ax >>>> + { 0xF0, 0x3E, 0xFF, 0x06, 0x0F, 0x00 }, // lock inc word [ds:0xf] >>>> + >>>> + 0xFA, // cli @ offset 0xb >>>> + 0xF4, // hlt >>>> + { 0xEB, 0xFC }, // jmp short 0xb >>>> + >>>> + 0x0000 // WordCounter @ offset 0xf >>>> +}; >>>> + >>>> EFI_MP_SERVICES_PROTOCOL mMpServicesTemplate = { >>>> GetNumberOfProcessors, >>>> GetProcessorInfo, >>>> @@ -1659,11 +1686,31 @@ ExitBootServicesCallback ( >>>> IN VOID *Context >>>> ) >>>> { >>>> + CPU_AP_HLT_LOOP *CpuApHltLoop; >>>> + UINT16 ApCount; >>>> + >>>> + CpuApHltLoop = Context; >>>> + CopyMem (CpuApHltLoop, &mCpuApHltLoopTemplate, sizeof >>>> mCpuApHltLoopTemplate); >>>> + >>>> // >>>> - // Avoid APs access invalid buff datas which allocated by BootServices, >>>> - // so we send INIT IPI to APs to let them wait for SIPI state. >>>> + // Point DS via AX to the page where the routine has been allocated. >>>> // >>>> - SendInitIpiAllExcludingSelf (); >>>> + CpuApHltLoop->AxValue = (UINT16)((UINTN)Context >> 4); >>>> + DEBUG ((EFI_D_VERBOSE, "%a: %a: HLT loop routine at %04x:%04x\n", >>>> + gEfiCallerBaseName, __FUNCTION__, CpuApHltLoop->AxValue, 0)); >>>> + >>>> + // >>>> + // Force all APs to execute the HLT loop. Wait until they all enter the >>>> code. >>>> + // >>>> + SendInitSipiSipiAllExcludingSelf ((UINT32)(UINTN)Context); >>>> + >>>> + ApCount = (UINT16)(mMpSystemData.NumberOfProcessors - 1); >>>> + while (InterlockedCompareExchange16 ( >>>> + &CpuApHltLoop->WordCounter, >>>> + ApCount, >>>> + ApCount) != ApCount) { >>>> + CpuPause(); >>>> + } >>>> } >>>> >>>> /** >>>> @@ -1691,9 +1738,10 @@ InitializeMpSupport ( >>>> VOID >>>> ) >>>> { >>>> - EFI_STATUS Status; >>>> - MTRR_SETTINGS MtrrSettings; >>>> - UINTN Timeout; >>>> + EFI_STATUS Status; >>>> + MTRR_SETTINGS MtrrSettings; >>>> + UINTN Timeout; >>>> + EFI_PHYSICAL_ADDRESS CpuApHltLoopAddress; >>>> >>>> gMaxLogicalProcessorNumber = (UINTN) PcdGet32 >>>> (PcdCpuMaxLogicalProcessorNumber); >>>> if (gMaxLogicalProcessorNumber < 1) { >>>> @@ -1795,12 +1843,20 @@ InitializeMpSupport ( >>>> } >>>> } >>>> >>>> - Status = gBS->CreateEvent ( >>>> - EVT_SIGNAL_EXIT_BOOT_SERVICES, >>>> - TPL_CALLBACK, >>>> - ExitBootServicesCallback, >>>> - NULL, >>>> - &mExitBootServicesEvent >>>> - ); >>>> - ASSERT_EFI_ERROR (Status); >>>> + if (mMpSystemData.NumberOfProcessors > 1) { >>>> + CpuApHltLoopAddress = BASE_1MB - 1; >>>> + Status = gBS->AllocatePages (AllocateMaxAddress, EfiACPIMemoryNVS, >>>> + EFI_SIZE_TO_PAGES (sizeof (CPU_AP_HLT_LOOP)), >>>> + &CpuApHltLoopAddress); >>>> + ASSERT_EFI_ERROR (Status); >>>> + >>>> + Status = gBS->CreateEvent ( >>>> + EVT_SIGNAL_EXIT_BOOT_SERVICES, >>>> + TPL_CALLBACK, >>>> + ExitBootServicesCallback, >>>> + (VOID *)(UINTN)CpuApHltLoopAddress, >>>> + &mExitBootServicesEvent >>>> + ); >>>> + ASSERT_EFI_ERROR (Status); >>>> + } >>>> } >>>> -- >>>> 1.8.3.1 >>>> >> >> _______________________________________________ >> edk2-devel mailing list >> [email protected] >> https://lists.01.org/mailman/listinfo/edk2-devel _______________________________________________ edk2-devel mailing list [email protected] https://lists.01.org/mailman/listinfo/edk2-devel

