On 10/16/15 23:17, Kinney, Michael D wrote: > Laszlo, > > Here is a proposed fix on top of your v3 patch services to address a > race condition that was introduced by the v3 patch series call to > StartupAllAPs() with SingleThread set to TRUE in the MP > initialization. If the APs have not entered their idle loop before > StartupAllAPs() is called, then some of the APs will be in an > unexpected state, and StartupAllAPs() will hang. This is the hang > condition that is only seen when SingleThread parameter is set to > TRUE and StartupAllAPs() is called very shortly after > mAPsAlreadyInitFinished is set to TRUE that releases the APs to > complete their initialization. > > I added an internal function that checks to see if all APs are in the > sleeping state in the idle loop. On the first call to > StartupAllAPs(), this internal function is used to make sure the APs > are in a state that is compatible with StartupAllAPs() being used. I > put this check in the first call to StartupAllAPs(), so we do not > take the delay to wait for the APs unconditionally in the MP init > code. Other work can get done while the APs work their way to their > idle loop sleeping state. > > The one item remaining is to have a timeout with an ASSERT() if > timeout is triggered in first call to StartupAllAPs() waiting for the > APs to enter idle loop. > > I also reordered some of the actions InitializeMpSupport(), so the MP > Services Protocol and call to StartupAllAPs() are done as late as > possible to give APs more time to get to idle loop. > > Please merge this into your patch series.
Many thanks for this! I have two comments: First, I don't think that it is my patch "UefiCpuPkg: CpuDxe: broadcast MTRR changes to APs" that *introduces* the race condition, with the early call to StartupAllAps(). I believe that the race condition exists in the current code, and my patch only exposes it. Namely, my patch adds the StartupAllAps() call right before the MP services protocol interface is installed. Considering a case where my patch is not applied, but another driver registers a protocol notify callback for MP services, that callback would be invoked on the stack of InstallMultipleProtocolInterfaces(), while some of the APs might not have entered their idle loop yet. If the callback in that other driver located the protocol instance quickly, and called StartupAllAps() with similar arguments (which would be a valid thing to do), the same issue could be triggered. So I think the race is already there; the protocol interface gets installed too early. Which leads me to think that the code you add here under (!mStartupAllAPsCalled) should be executed unconditionally, right after "mAPsAlreadyInitFinished is set to TRUE that releases the APs to complete their initialization". We shouldn't install the MP services protocol interface until initialization is complete, and allows for any kinds of calls through that interface. Second, I think I'll put aside the SMM series for a short period, and work on a smaller UefiCpuPkg + OvmfPkg series. I'm thinking about six patches: (1) split out the crux of this patch (without the code rearrangement) and state that it fixes the existent (although mostly un-triggered) race condition (2) refine the above by introducing the timeout you mention. I'm thinking GetTimeInNanoSecond (GetPerformanceCounter ()) should be a good way to measure the elapsed time; it seems to be a pattern used elsewhere in the tree. This will also require a new PCD. What default would you suggest for the timeout? (3) add the rearrangement that gives more time to the APs to enter their idle loops (4) Include my MTRR broadcast patch (5) Separately, implement your other suggestion with gUefiCpuPkgTokenSpaceGuid.PcdCpuApInitTimeOutInMicroSeconds, from <http://thread.gmane.org/gmane.comp.bios.edk2.devel/2415/focus=141501>. (6) Xiao Guangrong provided us with the info for wording a commit for OvmfPkg where the timeout from (5) would be increased. Obviously I'll keep your authorship wherever appropriate. Once these are in, I'd resume working on the SMM series. ... Actually, what do you think about committing your v2 (or maybe v3) UefiCpuPkg series first? We've been trying to do many things in parallel, and it's getting a bit chaotic & hard to track. So I'm thinking: - your set goes in first - I'd post the series with the above 6 patches - once it's in, I'd return to the SMM series. What's your opinion? Thanks! Laszlo > > Best regards, > > Mike > > > Contributed-under: TianoCore Contribution Agreement 1.0 > Signed-off-by: Michael Kinney <[email protected]> > > > 24d1f93110e1f73c70ff3705950c119693328deb > UefiCpuPkg/CpuDxe/CpuMp.c | 93 > ++++++++++++++++++++++++++++++++++++++--------- > 1 file changed, 75 insertions(+), 18 deletions(-) > > diff --git a/UefiCpuPkg/CpuDxe/CpuMp.c b/UefiCpuPkg/CpuDxe/CpuMp.c > index fbe43f5..4af0361 100644 > --- a/UefiCpuPkg/CpuDxe/CpuMp.c > +++ b/UefiCpuPkg/CpuDxe/CpuMp.c > @@ -29,6 +29,7 @@ VOID *mApStackStart = 0; > > volatile BOOLEAN mAPsAlreadyInitFinished = FALSE; > volatile BOOLEAN mStopCheckAllAPsStatus = TRUE; > +BOOLEAN mStartupAllAPsCalled = FALSE; > > EFI_MP_SERVICES_PROTOCOL mMpServicesTemplate = { > GetNumberOfProcessors, > @@ -313,6 +314,47 @@ CheckAndUpdateAllAPsToIdleState ( > } > > /** > + Check if all APs are in state CpuStateSleeping. > + > + Return TRUE if all APs are in the CpuStateSleeping state. Do not > + check the state of the BSP or any disabled APs. > + > + @retval TRUE All APs are in CpuStateSleeping state. > + @retval FALSE One or more APs are not in CpuStateSleeping state. > + > +**/ > +BOOLEAN > +CheckAllAPsSleeping ( > + VOID > + ) > +{ > + UINTN ProcessorNumber; > + CPU_DATA_BLOCK *CpuData; > + > + for (ProcessorNumber = 0; ProcessorNumber < > mMpSystemData.NumberOfProcessors; ProcessorNumber++) { > + CpuData = &mMpSystemData.CpuDatas[ProcessorNumber]; > + if (TestCpuStatusFlag (CpuData, PROCESSOR_AS_BSP_BIT)) { > + // > + // Skip BSP > + // > + continue; > + } > + > + if (!TestCpuStatusFlag (CpuData, PROCESSOR_ENABLED_BIT)) { > + // > + // Skip Disabled processors > + // > + continue; > + } > + > + if (GetApState (CpuData) != CpuStateSleeping) { > + return FALSE; > + } > + } > + return TRUE; > +} > + > +/** > If the timeout expires before all APs returns from Procedure, > we should forcibly terminate the executing AP and fill FailedList back > by StartupAllAPs(). > @@ -647,6 +689,18 @@ StartupAllAPs ( > // > mStopCheckAllAPsStatus = TRUE; > > + if (!mStartupAllAPsCalled) { > + // > + // If this is first call to StartAllAPs(), then > + // wait for all APs to enter idle loop. > + // > + while (!CheckAllAPsSleeping ()) { > + CpuPause(); > + } > + > + mStartupAllAPsCalled = TRUE; > + } > + > for (Number = 0; Number < mMpSystemData.NumberOfProcessors; Number++) { > CpuData = &mMpSystemData.CpuDatas[Number]; > if (TestCpuStatusFlag (CpuData, PROCESSOR_AS_BSP_BIT)) { > @@ -1700,6 +1754,9 @@ InitializeMpSupport ( > sizeof (CPU_DATA_BLOCK) * > mMpSystemData.NumberOfProcessors, > mMpSystemData.CpuDatas); > > + // > + // Release all APs to complete initialization and enter idle loop > + // > mAPsAlreadyInitFinished = TRUE; > > // > @@ -1707,6 +1764,23 @@ InitializeMpSupport ( > // > CollectBistDataFromHob (); > > + if (mMpSystemData.NumberOfProcessors > 1 && > mMpSystemData.NumberOfProcessors < gMaxLogicalProcessorNumber) { > + if (mApStackStart != NULL) { > + FreePages (mApStackStart, EFI_SIZE_TO_PAGES ( > + (gMaxLogicalProcessorNumber - > mMpSystemData.NumberOfProcessors) * > + gApStackSize)); > + } > + } > + > + Status = gBS->CreateEvent ( > + EVT_SIGNAL_EXIT_BOOT_SERVICES, > + TPL_CALLBACK, > + ExitBootServicesCallback, > + NULL, > + &mExitBootServicesEvent > + ); > + ASSERT_EFI_ERROR (Status); > + > // > // Synchronize MTRR settings to APs. > // > @@ -1721,28 +1795,11 @@ InitializeMpSupport ( > NULL // FailedCpuList > ); > ASSERT (Status == EFI_SUCCESS || Status == EFI_NOT_STARTED); > - > + > Status = gBS->InstallMultipleProtocolInterfaces ( > &mMpServiceHandle, > &gEfiMpServiceProtocolGuid, &mMpServicesTemplate, > NULL > ); > ASSERT_EFI_ERROR (Status); > - > - if (mMpSystemData.NumberOfProcessors > 1 && > mMpSystemData.NumberOfProcessors < gMaxLogicalProcessorNumber) { > - if (mApStackStart != NULL) { > - FreePages (mApStackStart, EFI_SIZE_TO_PAGES ( > - (gMaxLogicalProcessorNumber - > mMpSystemData.NumberOfProcessors) * > - gApStackSize)); > - } > - } > - > - Status = gBS->CreateEvent ( > - EVT_SIGNAL_EXIT_BOOT_SERVICES, > - TPL_CALLBACK, > - ExitBootServicesCallback, > - NULL, > - &mExitBootServicesEvent > - ); > - ASSERT_EFI_ERROR (Status); > } > > > > >> -----Original Message----- >> From: edk2-devel [mailto:[email protected]] On Behalf Of >> Laszlo Ersek >> Sent: Wednesday, October 14, 2015 3:26 PM >> To: [email protected] >> Subject: [edk2] [PATCH v3 04/52] UefiCpuPkg: CpuDxe: broadcast MTRR >> changes to APs >> >> The >> >> Quark_EDKII_v1.1.0/IA32FamilyCpuBasePkg/CpuArchDxe >> >> driver applies any MTRR changes to APs, if the EFI_MP_SERVICES_PROTOCOL >> is >> available. We should do the same. >> >> Additionally, the broadcast should occur at MP startup as well, not only >> when MTRR settings are changed. The inspiration is taken from >> >> Quark_EDKII_v1.1.0/IA32FamilyCpuBasePkg/CpuMpDxe/ >> >> (see the EarlyMpInit() function and its call sites in >> "ProcessorConfig.c"). >> >> Contributed-under: TianoCore Contribution Agreement 1.0 >> Signed-off-by: Laszlo Ersek <[email protected]> >> Reviewed-by: Jeff Fan <[email protected]> >> --- >> >> Notes: >> v3: >> - Although Jeff reviewed this, I'm not committing it just yet, because >> contextually it is on top of Mike's pending patches. Picked up the R-b >> tag from Jeff though. No code changes. >> >> v2: >> - This patch replaces the following patches from v1: >> - UefiCpuPkg: CpuDxe: optionally save MTRR settings to AcpiNVS memory >> block >> - UefiCpuPkg: CpuDxe: broadcast MTRR changes to APs >> - UefiCpuPkg: CpuDxe: sync MTRR settings to APs at MP startup >> - UefiCpuPkg: CpuDxe: provide EFI_MP_SERVICES_PROTOCOL when there's >> no >> AP >> >> The first v1 patch was deemed inappropriate for general use, and Mike >> suggested a good alternative for OVMF (=> grab MTRR settings in >> CpuS3DataDxe at EndOfDxe). >> >> The second and third v1 patches are now squashed together into this v2 >> patch; they are small and belong together logically. >> >> The fourth v1 patch is redundant now; the same has been covered by >> Mike. >> >> UefiCpuPkg/CpuDxe/CpuMp.h | 13 ++++++++ >> UefiCpuPkg/CpuDxe/CpuDxe.c | 26 +++++++++++++++ >> UefiCpuPkg/CpuDxe/CpuMp.c | 34 +++++++++++++++++++- >> 3 files changed, 72 insertions(+), 1 deletion(-) >> >> diff --git a/UefiCpuPkg/CpuDxe/CpuMp.h b/UefiCpuPkg/CpuDxe/CpuMp.h >> index d2866e4..503f3ae 100644 >> --- a/UefiCpuPkg/CpuDxe/CpuMp.h >> +++ b/UefiCpuPkg/CpuDxe/CpuMp.h >> @@ -643,5 +643,18 @@ ResetApStackless ( >> IN UINT32 ProcessorId >> ); >> >> +/** >> + A minimal wrapper function that allows MtrrSetAllMtrrs() to be passed to >> + EFI_MP_SERVICES_PROTOCOL.StartupAllAPs() as Procedure. >> + >> + @param[in] Buffer Pointer to an MTRR_SETTINGS object, to be passed to >> + MtrrSetAllMtrrs(). >> +**/ >> +VOID >> +EFIAPI >> +SetMtrrsFromBuffer ( >> + IN VOID *Buffer >> + ); >> + >> #endif // _CPU_MP_H_ >> >> diff --git a/UefiCpuPkg/CpuDxe/CpuDxe.c b/UefiCpuPkg/CpuDxe/CpuDxe.c >> index c9df4e1..daf97bd 100644 >> --- a/UefiCpuPkg/CpuDxe/CpuDxe.c >> +++ b/UefiCpuPkg/CpuDxe/CpuDxe.c >> @@ -350,6 +350,9 @@ CpuSetMemoryAttributes ( >> { >> RETURN_STATUS Status; >> MTRR_MEMORY_CACHE_TYPE CacheType; >> + EFI_STATUS MpStatus; >> + EFI_MP_SERVICES_PROTOCOL *MpService; >> + MTRR_SETTINGS MtrrSettings; >> >> if (!IsMtrrSupported ()) { >> return EFI_UNSUPPORTED; >> @@ -405,6 +408,29 @@ CpuSetMemoryAttributes ( >> CacheType >> ); >> >> + if (!RETURN_ERROR (Status)) { >> + MpStatus = gBS->LocateProtocol ( >> + &gEfiMpServiceProtocolGuid, >> + NULL, >> + (VOID **)&MpService >> + ); >> + // >> + // Synchronize the update with all APs >> + // >> + if (!EFI_ERROR (MpStatus)) { >> + MtrrGetAllMtrrs (&MtrrSettings); >> + MpStatus = MpService->StartupAllAPs ( >> + MpService, // This >> + SetMtrrsFromBuffer, // Procedure >> + TRUE, // SingleThread >> + NULL, // WaitEvent >> + 0, // TimeoutInMicrosecsond >> + &MtrrSettings, // ProcedureArgument >> + NULL // FailedCpuList >> + ); >> + ASSERT (MpStatus == EFI_SUCCESS || MpStatus == EFI_NOT_STARTED); >> + } >> + } >> return (EFI_STATUS) Status; >> } >> >> diff --git a/UefiCpuPkg/CpuDxe/CpuMp.c b/UefiCpuPkg/CpuDxe/CpuMp.c >> index da3686e..fbe43f5 100644 >> --- a/UefiCpuPkg/CpuDxe/CpuMp.c >> +++ b/UefiCpuPkg/CpuDxe/CpuMp.c >> @@ -1626,6 +1626,22 @@ ExitBootServicesCallback ( >> } >> >> /** >> + A minimal wrapper function that allows MtrrSetAllMtrrs() to be passed to >> + EFI_MP_SERVICES_PROTOCOL.StartupAllAPs() as Procedure. >> + >> + @param[in] Buffer Pointer to an MTRR_SETTINGS object, to be passed to >> + MtrrSetAllMtrrs(). >> +**/ >> +VOID >> +EFIAPI >> +SetMtrrsFromBuffer ( >> + IN VOID *Buffer >> + ) >> +{ >> + MtrrSetAllMtrrs (Buffer); >> +} >> + >> +/** >> Initialize Multi-processor support. >> >> **/ >> @@ -1634,7 +1650,8 @@ InitializeMpSupport ( >> VOID >> ) >> { >> - EFI_STATUS Status; >> + EFI_STATUS Status; >> + MTRR_SETTINGS MtrrSettings; >> >> gMaxLogicalProcessorNumber = (UINTN) PcdGet32 >> (PcdCpuMaxLogicalProcessorNumber); >> if (gMaxLogicalProcessorNumber < 1) { >> @@ -1690,6 +1707,21 @@ InitializeMpSupport ( >> // >> CollectBistDataFromHob (); >> >> + // >> + // Synchronize MTRR settings to APs. >> + // >> + MtrrGetAllMtrrs (&MtrrSettings); >> + Status = mMpServicesTemplate.StartupAllAPs ( >> + &mMpServicesTemplate, // This >> + SetMtrrsFromBuffer, // Procedure >> + TRUE, // SingleThread >> + NULL, // WaitEvent >> + 0, // >> TimeoutInMicrosecsond >> + &MtrrSettings, // ProcedureArgument >> + NULL // FailedCpuList >> + ); >> + ASSERT (Status == EFI_SUCCESS || Status == EFI_NOT_STARTED); >> + >> Status = gBS->InstallMultipleProtocolInterfaces ( >> &mMpServiceHandle, >> &gEfiMpServiceProtocolGuid, &mMpServicesTemplate, >> -- >> 1.8.3.1 >> >> >> _______________________________________________ >> edk2-devel mailing list >> [email protected] >> https://lists.01.org/mailman/listinfo/edk2-devel _______________________________________________ edk2-devel mailing list [email protected] https://lists.01.org/mailman/listinfo/edk2-devel

