On 11/02/15 04:00, Fan, Jeff wrote: > Laszlo, > > Your updating patch looks better. But when I tried it on my platform, > Sometime, it hang at syncing MTRRs from BSP to Aps. I am > investigating it and will feedback you later.
Thanks. Actually, it has crossed my mind that *maybe* the recent KVM (i.e., Linux host kernel) changes do not introduce, just *expose*, a preexistent bug in the AP startup, when the latter runs in a VM, as part of OVMF. Perhaps we should audit the MP services code (and even its design?) from the ground up, against data races. I don't know the details (like the AP state diagram) at all; I'm suggesting this based on the recent patches only. Shared data should be clearly marked as such (and grouped separately from non-shared data); shared data should be accessed only under locks or with atomic helpers, locks should be as coarse as possible, and so on. (Fine-grained locks or outright lockless code are useful when there are many small requests, but that should not be our primary scenario. Plus, correctness comes before performance...) Thanks Laszlo > > Thanks! > Jeff > > -----Original Message----- > From: Laszlo Ersek [mailto:[email protected]] > Sent: Wednesday, October 28, 2015 8:16 PM > To: Fan, Jeff; [email protected] > Cc: Kinney, Michael D; Justen, Jordan L; Chen Fan > Subject: Re: [Patch 2/2] UefiCpuPkg/CpuDxe: Fix one dead lock issue in > ProcessorToIdleState() > > On 10/28/15 07:42, Jeff Fan wrote: >> There is one dead lock issue between BSP and AP. AP has one logic to >> enter into Sleep state (hlt) if it found its state is Idle state in one loop. >> In ProcessorToIdleState() function, AP will get current AP state, if >> its state is Sleep state, then it will go into real sleep(hlt) state. >> The dead lock may happen if BSP found AP state is Sleep state and send >> INIT-SIPI -SIPI to AP. But AP just acquried the lock to get its state. >> If AP is reset, the lock will not released correctly. >> This fix is to let AP enter into real sleep(hlt) state immediately >> after it changes its state is set to Sleep and needn't get lock again. >> >> Contributed-under: TianoCore Contribution Agreement 1.0 >> Signed-off-by: Jeff Fan <[email protected]> >> Cc: Michael Kinney <[email protected]> >> Cc: Jordan Justen <[email protected]> >> Cc: Laszlo Ersek <[email protected]> >> Cc: Chen Fan <[email protected]> >> --- >> UefiCpuPkg/CpuDxe/CpuMp.c | 12 ++++++++---- >> 1 file changed, 8 insertions(+), 4 deletions(-) >> >> diff --git a/UefiCpuPkg/CpuDxe/CpuMp.c b/UefiCpuPkg/CpuDxe/CpuMp.c >> index 3de2aa5..357ee42 100644 >> --- a/UefiCpuPkg/CpuDxe/CpuMp.c >> +++ b/UefiCpuPkg/CpuDxe/CpuMp.c >> @@ -1289,14 +1289,18 @@ ProcessorToIdleState ( >> GetMpSpinLock (CpuData); >> if (CpuData->State == CpuStateIdle) { >> CpuData->State = CpuStateSleeping; >> + ReleaseMpSpinLock (CpuData); >> + while (TRUE) { >> + // >> + // Place AP into cli-hlt loop to handle RSM from SMM case >> + // >> + DisableInterrupts (); >> + CpuSleep (); >> + } >> } >> ReleaseMpSpinLock (CpuData); >> } >> >> - if (GetApState (CpuData) == CpuStateSleeping) { >> - CpuSleep (); >> - } >> - >> CpuPause (); >> } >> >> > > I think the idea in this patch is good, but I recommend a *much* more > sweeping change for this function. From my past multi-threaded programming > experience, it is always best to hold the lock *by default* in a critical > section, and only release it *temporarily* while yielding, or invoking a > callback, or invoking a blocking operation. > > - It makes no sense to release and reacquire a mutex or splinlock if our > thread or CPU has actual, fast work to do, without wanting to yield > intentionally. > > - For loops especially, this means that the loop body is both *entered* and > *exited* with the lock held. > > - This approach tends to improve throughput and happens to eliminate a > *bunch* of TOCTTOU races. > > Therefore my proposal would be: > >> diff --git a/UefiCpuPkg/CpuDxe/CpuMp.c b/UefiCpuPkg/CpuDxe/CpuMp.c >> index 98976a1..31c6222 100644 >> --- a/UefiCpuPkg/CpuDxe/CpuMp.c >> +++ b/UefiCpuPkg/CpuDxe/CpuMp.c >> @@ -1288,40 +1288,42 @@ ProcessorToIdleState ( >> CpuData->Procedure = NULL; >> } >> CpuData->State = CpuStateIdle; >> - ReleaseMpSpinLock (CpuData); >> >> while (TRUE) { >> - GetMpSpinLock (CpuData); >> ProcedureArgument = CpuData->Parameter; >> Procedure = CpuData->Procedure; >> - ReleaseMpSpinLock (CpuData); >> >> if (Procedure != NULL) { >> - SetApState (CpuData, CpuStateBusy); >> + CpuData->State = CpuStateBusy; >> + ReleaseMpSpinLock (CpuData); >> >> Procedure ((VOID*) ProcedureArgument); >> >> GetMpSpinLock (CpuData); >> CpuData->Procedure = NULL; >> CpuData->State = CpuStateFinished; >> - ReleaseMpSpinLock (CpuData); >> } else { >> // >> // if no procedure to execution, we simply put AP >> // into sleeping state, and waiting BSP sent SIPI. >> // >> - GetMpSpinLock (CpuData); >> if (CpuData->State == CpuStateIdle) { >> - CpuData->State = CpuStateSleeping; >> + CpuData->State = CpuStateSleeping; >> + ReleaseMpSpinLock (CpuData); >> + >> + // >> + // wait for INIT-SIPI-SIPI >> + // >> + while (TRUE) { >> + DisableInterrupts (); >> + CpuSleep (); >> + } >> } >> - ReleaseMpSpinLock (CpuData); >> - } >> - >> - if (GetApState (CpuData) == CpuStateSleeping) { >> - CpuSleep (); >> } >> >> + ReleaseMpSpinLock (CpuData); >> CpuPause (); >> + GetMpSpinLock (CpuData); >> } >> >> CpuSleep (); > > Thank you, > Laszlo > _______________________________________________ > edk2-devel mailing list > [email protected] > https://lists.01.org/mailman/listinfo/edk2-devel > _______________________________________________ edk2-devel mailing list [email protected] https://lists.01.org/mailman/listinfo/edk2-devel

