V2 changes include:
1. Add more description for the code part which need easy to understand.
2. Refine some code base on feedback for V1 changes.

V1 changes include:
In a system which has multiple cores, current set register value task costs 
huge times.
After investigation, current set MSR task costs most of the times. Current 
logic uses
SpinLock to let set MSR task as an single thread task for all cores. Because 
MSR has
scope attribute which may cause GP fault if multiple APs set MSR at the same 
time,
current logic use an easiest solution (use SpinLock) to avoid this issue, but 
it will
cost huge times.

In order to fix this performance issue, new solution will set MSRs base on 
their scope
attribute. After this, the SpinLock will not needed. Without SpinLock, new 
issue raised
which is caused by MSR dependence. For example, MSR A depends on MSR B which 
means MSR A
must been set after MSR B has been set. Also MSR B is package scope level and 
MSR A is
thread scope level. If system has multiple threads, Thread 1 needs to set the 
thread level
MSRs and thread 2 needs to set thread and package level MSRs. Set MSRs task for 
thread 1
and thread 2 like below:

            Thread 1                 Thread 2
MSR B          N                        Y
MSR A          Y                        Y

If driver don't control execute MSR order, for thread 1, it will execute MSR A 
first, but
at this time, MSR B not been executed yet by thread 2. system may trig 
exception at this
time.

In order to fix the above issue, driver introduces semaphore logic to control 
the MSR
execute sequence. For the above case, a semaphore will be add between MSR A and 
B for
all threads. Semaphore has scope info for it. The possible scope value is core 
or package.
For each thread, when it meets a semaphore during it set registers, it will 1) 
release
semaphore (+1) for each threads in this core or package(based on the scope info 
for this
semaphore) 2) acquire semaphore (-1) for all the threads in this core or 
package(based
on the scope info for this semaphore). With these two steps, driver can control 
MSR
sequence. Sample code logic like below:

  //
  // First increase semaphore count by 1 for processors in this package.
  //
  for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ; 
ProcessorIndex ++) {
    LibReleaseSemaphore ((UINT32 *) &SemaphorePtr[PackageOffset + 
ProcessorIndex]);
  }
  //
  // Second, check whether the count has reach the check number.
  //
  for (ProcessorIndex = 0; ProcessorIndex < ValidApCount; ProcessorIndex ++) {
    LibWaitForSemaphore (&SemaphorePtr[ApOffset]);
  }

Platform Requirement:
1. This change requires register MSR setting base on MSR scope info. If still 
register MSR
   for all threads, exception may raised.

Known limitation:
1. Current CpuFeatures driver supports DXE instance and PEI instance. But 
semaphore logic
   requires Aps execute in async mode which is not supported by PEI driver. So 
CpuFeature
   PEI instance not works after this change. We plan to support async mode for 
PEI in phase
   2 for this task.

Cc: Ruiyu Ni <ruiyu...@intel.com>
Cc: Laszlo Ersek <ler...@redhat.com>
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Eric Dong <eric.d...@intel.com>
---
 .../RegisterCpuFeaturesLib/CpuFeaturesInitialize.c | 438 ++++++++++++++++---
 .../DxeRegisterCpuFeaturesLib.c                    |  71 ++-
 .../DxeRegisterCpuFeaturesLib.inf                  |   3 +
 .../PeiRegisterCpuFeaturesLib.c                    |  55 ++-
 .../PeiRegisterCpuFeaturesLib.inf                  |   1 +
 .../RegisterCpuFeaturesLib/RegisterCpuFeatures.h   |  61 ++-
 .../RegisterCpuFeaturesLib.c                       | 484 ++++++++++++++++++---
 7 files changed, 980 insertions(+), 133 deletions(-)

diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c 
b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
index ba3fb3250f..2bf2254602 100644
--- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
+++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/CpuFeaturesInitialize.c
@@ -145,6 +145,19 @@ CpuInitDataInitialize (
   CPU_FEATURES_INIT_ORDER              *InitOrder;
   CPU_FEATURES_DATA                    *CpuFeaturesData;
   LIST_ENTRY                           *Entry;
+  UINT32                               Core;
+  UINT32                               Package;
+  UINT32                               Thread;
+  EFI_CPU_PHYSICAL_LOCATION            *Location;
+  BOOLEAN                              *CoresVisited;
+  UINTN                                Index;
+  ACPI_CPU_DATA                        *AcpiCpuData;
+  CPU_STATUS_INFORMATION               *CpuStatus;
+  UINT32                               *ValidCorePerPackage;
+
+  Core    = 0;
+  Package = 0;
+  Thread  = 0;
 
   CpuFeaturesData = GetCpuFeaturesData ();
   CpuFeaturesData->InitOrder = AllocateZeroPool (sizeof 
(CPU_FEATURES_INIT_ORDER) * NumberOfCpus);
@@ -163,6 +176,17 @@ CpuInitDataInitialize (
     Entry = Entry->ForwardLink;
   }
 
+  CpuFeaturesData->NumberOfCpus = (UINT32) NumberOfCpus;
+
+  AcpiCpuData = GetAcpiCpuData ();
+  ASSERT (AcpiCpuData != NULL);
+  CpuFeaturesData->AcpiCpuData= AcpiCpuData;
+
+  CpuStatus = &AcpiCpuData->CpuStatus;
+  Location = AllocateZeroPool (sizeof (EFI_CPU_PHYSICAL_LOCATION) * 
NumberOfCpus);
+  ASSERT (Location != NULL);
+  AcpiCpuData->ApLocation = (EFI_PHYSICAL_ADDRESS)(UINTN)Location;
+
   for (ProcessorNumber = 0; ProcessorNumber < NumberOfCpus; ProcessorNumber++) 
{
     InitOrder = &CpuFeaturesData->InitOrder[ProcessorNumber];
     InitOrder->FeaturesSupportedMask = AllocateZeroPool 
(CpuFeaturesData->BitMaskSize);
@@ -175,7 +199,76 @@ CpuInitDataInitialize (
       &ProcessorInfoBuffer,
       sizeof (EFI_PROCESSOR_INFORMATION)
       );
+    CopyMem (
+      Location + ProcessorNumber,
+      &ProcessorInfoBuffer.Location,
+      sizeof (EFI_CPU_PHYSICAL_LOCATION)
+      );
+
+    //
+    // Collect CPU package count info.
+    //
+    if (Package < ProcessorInfoBuffer.Location.Package) {
+      Package = ProcessorInfoBuffer.Location.Package;
+    }
+    //
+    // Collect CPU max core count info.
+    //
+    if (Core < ProcessorInfoBuffer.Location.Core) {
+      Core = ProcessorInfoBuffer.Location.Core;
+    }
+    //
+    // Collect CPU max thread count info.
+    //
+    if (Thread < ProcessorInfoBuffer.Location.Thread) {
+      Thread = ProcessorInfoBuffer.Location.Thread;
+    }
   }
+  CpuStatus->PackageCount    = Package + 1;
+  CpuStatus->MaxCoreCount    = Core + 1;
+  CpuStatus->MaxThreadCount  = Thread + 1;
+  DEBUG ((DEBUG_INFO, "Processor Info: Package: %d, MaxCore : %d, MaxThread: 
%d\n",
+         CpuStatus->PackageCount,
+         CpuStatus->MaxCoreCount,
+         CpuStatus->MaxThreadCount));
+
+  //
+  // Collect valid core count in each package because not all cores are valid.
+  //
+  ValidCorePerPackage= AllocateZeroPool (sizeof (UINT32) * 
CpuStatus->PackageCount);
+  ASSERT (ValidCorePerPackage != 0);
+  CpuStatus->ValidCoresPerPackages = 
(EFI_PHYSICAL_ADDRESS)(UINTN)ValidCorePerPackage;
+  CoresVisited = AllocatePool (sizeof (UINT32) * CpuStatus->MaxCoreCount);
+  ASSERT (CoresVisited != NULL);
+
+  for (Index = 0; Index < CpuStatus->PackageCount; Index ++ ) {
+    ZeroMem (CoresVisited, sizeof (UINT32) * CpuStatus->MaxCoreCount);
+    //
+    // Collect valid cores in Current package.
+    //
+    for (ProcessorNumber = 0; ProcessorNumber < NumberOfCpus; 
ProcessorNumber++) {
+      Location = 
&CpuFeaturesData->InitOrder[ProcessorNumber].CpuInfo.ProcessorInfo.Location;
+      if (Location->Package == Index && !CoresVisited[Location->Core] ) {
+        //
+        // The ValidCores position for Location->Core is valid.
+        // The possible values in ValidCores[Index] are 0 or 1.
+        // FALSE means no valid threads in this Core.
+        // TRUE means have valid threads in this core, no matter the thead 
count is 1 or more.
+        //
+        CoresVisited[Location->Core] = TRUE;
+        ValidCorePerPackage[Index]++;
+      }
+    }
+  }
+  FreePool (CoresVisited);
+
+  for (Index = 0; Index <= Package; Index++) {
+    DEBUG ((DEBUG_INFO, "Package: %d, Valid Core : %d\n", Index, 
ValidCorePerPackage[Index]));
+  }
+
+  CpuFeaturesData->CpuFlags.SemaphoreCount = AllocateZeroPool (sizeof (UINT32) 
* CpuStatus->PackageCount * CpuStatus->MaxCoreCount * 
CpuStatus->MaxThreadCount);
+  ASSERT (CpuFeaturesData->CpuFlags.SemaphoreCount != NULL);
+
   //
   // Get support and configuration PCDs
   //
@@ -310,7 +403,7 @@ CollectProcessorData (
   LIST_ENTRY                           *Entry;
   CPU_FEATURES_DATA                    *CpuFeaturesData;
 
-  CpuFeaturesData = GetCpuFeaturesData ();
+  CpuFeaturesData = (CPU_FEATURES_DATA *)Buffer;
   ProcessorNumber = GetProcessorIndex ();
   CpuInfo = &CpuFeaturesData->InitOrder[ProcessorNumber].CpuInfo;
   //
@@ -340,6 +433,33 @@ CollectProcessorData (
   }
 }
 
+/**
+  Get the string value for the input dependence type.
+
+  @param[in]  DependType  Input the dependence type.
+
+  @retval     Return the dependence type string.
+**/
+CHAR16 *
+GetDependTypeString (
+  IN CPU_FEATURE_DEPENDENCE_TYPE       DependType
+  )
+{
+  switch (DependType) {
+  case ThreadDepType:
+    return L"ThreadDepType";
+
+  case CoreDepType:
+    return L"CoreDepType";
+
+  case PackageDepType:
+    return L"PackageDepType";
+
+  default:
+    return L"NoneDepType";
+  }
+}
+
 /**
   Dump the contents of a CPU register table.
 
@@ -416,6 +536,15 @@ DumpRegisterTableOnProcessor (
         RegisterTableEntry->Value
         ));
       break;
+    case Semaphore:
+      DEBUG ((
+        DebugPrintErrorLevel,
+        "Processor: %d: Semaphore: Scope Value: %s\r\n",
+        ProcessorNumber,
+        GetDependTypeString 
((CPU_FEATURE_DEPENDENCE_TYPE)RegisterTableEntry->Value)
+        ));
+      break;
+
     default:
       break;
     }
@@ -441,6 +570,11 @@ AnalysisProcessorFeatures (
   REGISTER_CPU_FEATURE_INFORMATION     *CpuInfo;
   LIST_ENTRY                           *Entry;
   CPU_FEATURES_DATA                    *CpuFeaturesData;
+  LIST_ENTRY                           *NextEntry;
+  CPU_FEATURES_ENTRY                   *NextCpuFeatureInOrder;
+  BOOLEAN                              Success;
+  CPU_FEATURE_DEPENDENCE_TYPE          BeforeDep;
+  CPU_FEATURE_DEPENDENCE_TYPE          AfterDep;
 
   CpuFeaturesData = GetCpuFeaturesData ();
   CpuFeaturesData->CapabilityPcd = AllocatePool (CpuFeaturesData->BitMaskSize);
@@ -517,8 +651,14 @@ AnalysisProcessorFeatures (
     //
     CpuInfo = &CpuFeaturesData->InitOrder[ProcessorNumber].CpuInfo;
     Entry = GetFirstNode (&CpuInitOrder->OrderList);
+    NextEntry = Entry->ForwardLink;
     while (!IsNull (&CpuInitOrder->OrderList, Entry)) {
       CpuFeatureInOrder = CPU_FEATURE_ENTRY_FROM_LINK (Entry);
+      if (!IsNull (&CpuInitOrder->OrderList, NextEntry)) {
+        NextCpuFeatureInOrder = CPU_FEATURE_ENTRY_FROM_LINK (NextEntry);
+      } else {
+        NextCpuFeatureInOrder = NULL;
+      }
       if (IsBitMaskMatch (CpuFeatureInOrder->FeatureMask, 
CpuFeaturesData->SettingPcd)) {
         Status = CpuFeatureInOrder->InitializeFunc (ProcessorNumber, CpuInfo, 
CpuFeatureInOrder->ConfigData, TRUE);
         if (EFI_ERROR (Status)) {
@@ -532,6 +672,8 @@ AnalysisProcessorFeatures (
             DEBUG ((DEBUG_WARN, "Warning :: Failed to enable Feature: Mask = 
"));
             DumpCpuFeatureMask (CpuFeatureInOrder->FeatureMask);
           }
+        } else {
+          Success = TRUE;
         }
       } else {
         Status = CpuFeatureInOrder->InitializeFunc (ProcessorNumber, CpuInfo, 
CpuFeatureInOrder->ConfigData, FALSE);
@@ -542,9 +684,36 @@ AnalysisProcessorFeatures (
             DEBUG ((DEBUG_WARN, "Warning :: Failed to disable Feature: Mask = 
"));
             DumpCpuFeatureMask (CpuFeatureInOrder->FeatureMask);
           }
+        } else {
+          Success = TRUE;
         }
       }
-      Entry = Entry->ForwardLink;
+
+      if (Success) {
+        //
+        // If feature has dependence with the next feature (ONLY care 
core/package dependency).
+        // and feature initialize succeed, add sync semaphere here.
+        //
+        BeforeDep = DetectFeatureScope (CpuFeatureInOrder, TRUE);
+        if (NextCpuFeatureInOrder != NULL) {
+          AfterDep  = DetectFeatureScope (NextCpuFeatureInOrder, FALSE);
+        } else {
+          AfterDep = NoneDepType;
+        }
+        //
+        // Assume only one of the depend is valid.
+        //
+        ASSERT (!(BeforeDep > ThreadDepType && AfterDep > ThreadDepType));
+        if (BeforeDep > ThreadDepType) {
+          CPU_REGISTER_TABLE_WRITE32 (ProcessorNumber, Semaphore, 0, 
BeforeDep);
+        }
+        if (AfterDep > ThreadDepType) {
+          CPU_REGISTER_TABLE_WRITE32 (ProcessorNumber, Semaphore, 0, AfterDep);
+        }
+      }
+
+      Entry     = Entry->ForwardLink;
+      NextEntry = Entry->ForwardLink;
     }
 
     //
@@ -561,27 +730,116 @@ AnalysisProcessorFeatures (
   }
 }
 
+/**
+  Increment semaphore by 1.
+
+  @param      Sem            IN:  32-bit unsigned integer
+
+**/
+VOID
+LibReleaseSemaphore (
+  IN OUT  volatile UINT32           *Sem
+  )
+{
+  InterlockedIncrement (Sem);
+}
+
+/**
+  Decrement the semaphore by 1 if it is not zero.
+
+  Performs an atomic decrement operation for semaphore.
+  The compare exchange operation must be performed using
+  MP safe mechanisms.
+
+  @param      Sem            IN:  32-bit unsigned integer
+
+**/
+VOID
+LibWaitForSemaphore (
+  IN OUT  volatile UINT32           *Sem
+  )
+{
+  UINT32  Value;
+
+  do {
+    Value = *Sem;
+  } while (Value == 0 ||
+           InterlockedCompareExchange32 (
+             Sem,
+             Value,
+             Value - 1
+             ) != Value);
+}
+
+/**
+  Get the string value for the input register type.
+
+  @param[in]  RegisterType  Input the register type.
+
+  @retval     Return the register type string.
+**/
+CHAR16 *
+GetRegisterTypeString (
+  IN REGISTER_TYPE       RegisterType
+  )
+{
+  switch (RegisterType) {
+  case Msr:
+    return L"Msr";
+
+  case ControlRegister:
+    return L"ControlRegister";
+
+  case MemoryMapped:
+    return L"MemoryMapped";
+
+  case CacheControl:
+    return L"CacheControl";
+  
+  case Semaphore:
+    return L"Semaphore";
+
+  default:
+    ASSERT (FALSE);
+    return L"NoneType";
+  }
+}
+
 /**
   Initialize the CPU registers from a register table.
 
-  @param[in]  ProcessorNumber  The index of the CPU executing this function.
+  @param[in]  RegisterTable         The register table for this AP.
+  @param[in]  ApLocation            AP location info for this ap.
+  @param[in]  CpuStatus             CPU status info for this CPU.
+  @param[in]  CpuFlags              Flags data structure used when program the 
register.
 
   @note This service could be called by BSP/APs.
 **/
 VOID
 ProgramProcessorRegister (
-  IN UINTN  ProcessorNumber
+  IN CPU_REGISTER_TABLE           *RegisterTable,
+  IN EFI_CPU_PHYSICAL_LOCATION    *ApLocation,
+  IN CPU_STATUS_INFORMATION       *CpuStatus,
+  IN PROGRAM_CPU_REGISTER_FLAGS   *CpuFlags
   )
 {
-  CPU_FEATURES_DATA         *CpuFeaturesData;
-  CPU_REGISTER_TABLE        *RegisterTable;
   CPU_REGISTER_TABLE_ENTRY  *RegisterTableEntry;
   UINTN                     Index;
   UINTN                     Value;
   CPU_REGISTER_TABLE_ENTRY  *RegisterTableEntryHead;
-
-  CpuFeaturesData = GetCpuFeaturesData ();
-  RegisterTable = &CpuFeaturesData->RegisterTable[ProcessorNumber];
+  volatile UINT32           *SemaphorePtr;
+  UINT32                    FirstThread;
+  UINT32                    PackageThreadsCount;
+  UINT32                    CurrentThread;
+  UINTN                     ProcessorIndex;
+  UINTN                     ThreadIndex;
+  UINTN                     ValidThreadCount;
+  UINT32                    *ValidCorePerPackage;
+
+  ThreadIndex = ApLocation->Package * CpuStatus->MaxCoreCount * 
CpuStatus->MaxThreadCount \
+            + ApLocation->Core * CpuStatus->MaxThreadCount \
+            + ApLocation->Thread;
 
   //
   // Traverse Register Table of this logical processor
@@ -591,6 +849,13 @@ ProgramProcessorRegister (
   for (Index = 0; Index < RegisterTable->TableLength; Index++) {
 
     RegisterTableEntry = &RegisterTableEntryHead[Index];
+    DEBUG ((
+      DEBUG_INFO,
+      "Processor = %ul, Entry Index %ul, Type = %s!\n",
+      (UINT64)ThreadIndex,
+      (UINT64)Index,
+      GetRegisterTypeString((REGISTER_TYPE)RegisterTableEntry->RegisterType)
+      ));
 
     //
     // Check the type of specified register
@@ -654,10 +919,6 @@ ProgramProcessorRegister (
     // The specified register is Model Specific Register
     //
     case Msr:
-      //
-      // Get lock to avoid Package/Core scope MSRs programming issue in 
parallel execution mode
-      //
-      AcquireSpinLock (&CpuFeaturesData->MsrLock);
       if (RegisterTableEntry->ValidBitLength >= 64) {
         //
         // If length is not less than 64 bits, then directly write without 
reading
@@ -677,20 +938,19 @@ ProgramProcessorRegister (
           RegisterTableEntry->Value
           );
       }
-      ReleaseSpinLock (&CpuFeaturesData->MsrLock);
       break;
     //
     // MemoryMapped operations
     //
     case MemoryMapped:
-      AcquireSpinLock (&CpuFeaturesData->MemoryMappedLock);
+      AcquireSpinLock (&CpuFlags->MemoryMappedLock);
       MmioBitFieldWrite32 (
         (UINTN)(RegisterTableEntry->Index | LShiftU64 
(RegisterTableEntry->HighIndex, 32)),
         RegisterTableEntry->ValidBitStart,
         RegisterTableEntry->ValidBitStart + RegisterTableEntry->ValidBitLength 
- 1,
         (UINT32)RegisterTableEntry->Value
         );
-      ReleaseSpinLock (&CpuFeaturesData->MemoryMappedLock);
+      ReleaseSpinLock (&CpuFlags->MemoryMappedLock);
       break;
     //
     // Enable or disable cache
@@ -706,6 +966,77 @@ ProgramProcessorRegister (
       }
       break;
 
+    case Semaphore:
+      // Semaphore works logic like below:
+      //
+      //  V(x) = LibReleaseSemaphore (Semaphore[FirstThread + x]);
+      //  P(x) = LibWaitForSemaphore (Semaphore[FirstThread + x]);
+      //
+      //  All threads (T0...Tn) waits in P() line and continues running
+      //  together.
+      //
+      //
+      //  T0             T1            ...           Tn
+      //
+      //  V(0...n)       V(0...n)      ...           V(0...n)
+      //  n * P(0)       n * P(1)      ...           n * P(n)
+      //
+      SemaphorePtr = CpuFlags->SemaphoreCount;
+      switch (RegisterTableEntry->Value) {
+      case CoreDepType:
+        //
+        // Get Offset info for the first thread in the core which current 
thread belongs to.
+        //
+        FirstThread = (ApLocation->Package * CpuStatus->MaxCoreCount + 
ApLocation->Core) * CpuStatus->MaxThreadCount;
+        CurrentThread = FirstThread + ApLocation->Thread;
+        //
+        // First Notify all threads in current Core that this thread has ready.
+        //
+        for (ProcessorIndex = 0; ProcessorIndex < CpuStatus->MaxThreadCount; 
ProcessorIndex ++) {
+          LibReleaseSemaphore ((UINT32 *) &SemaphorePtr[FirstThread + 
ProcessorIndex]);
+        }
+        //
+        // Second, check whether all valid threads in current core have ready.
+        //
+        for (ProcessorIndex = 0; ProcessorIndex < CpuStatus->MaxThreadCount; 
ProcessorIndex ++) {
+          LibWaitForSemaphore (&SemaphorePtr[CurrentThread]);
+        }
+        break;
+
+      case PackageDepType:
+        ValidCorePerPackage = (UINT32 
*)(UINTN)CpuStatus->ValidCoresPerPackages;
+        //
+        // Get Offset info for the first thread in the package which current 
thread belongs to.
+        //
+        FirstThread = ApLocation->Package * CpuStatus->MaxCoreCount * 
CpuStatus->MaxThreadCount;
+        //
+        // Get the possible threads count for current package.
+        //
+        PackageThreadsCount = CpuStatus->MaxThreadCount * 
CpuStatus->MaxCoreCount;
+        CurrentThread = FirstThread + CpuStatus->MaxThreadCount * 
ApLocation->Core + ApLocation->Thread;
+        //
+        // Get the valid thread count for current package.
+        //
+        ValidThreadCount = CpuStatus->MaxThreadCount * 
ValidCorePerPackage[ApLocation->Package];
+        //
+        // First Notify all threads in current package that this thread has 
ready.
+        //
+        for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ; 
ProcessorIndex ++) {
+          LibReleaseSemaphore ((UINT32 *) &SemaphorePtr[FirstThread + 
ProcessorIndex]);
+        }
+        //
+        // Second, check whether all valid threads in current package have 
ready.
+        //
+        for (ProcessorIndex = 0; ProcessorIndex < ValidThreadCount; 
ProcessorIndex ++) {
+          LibWaitForSemaphore (&SemaphorePtr[CurrentThread]);
+        }
+        break;
+
+      default:
+        break;
+      }
+      break;
+
     default:
       break;
     }
@@ -724,10 +1055,36 @@ SetProcessorRegister (
   IN OUT VOID            *Buffer
   )
 {
-  UINTN                  ProcessorNumber;
+  CPU_FEATURES_DATA         *CpuFeaturesData;
+  CPU_REGISTER_TABLE        *RegisterTable;
+  CPU_REGISTER_TABLE        *RegisterTables;
+  UINT32                    InitApicId;
+  UINTN                     ProcIndex;
+  UINTN                     Index;
+  ACPI_CPU_DATA             *AcpiCpuData;
 
-  ProcessorNumber = GetProcessorIndex ();
-  ProgramProcessorRegister (ProcessorNumber);
+  CpuFeaturesData = (CPU_FEATURES_DATA *) Buffer;
+  AcpiCpuData = CpuFeaturesData->AcpiCpuData;
+
+  RegisterTables = (CPU_REGISTER_TABLE *)(UINTN)AcpiCpuData->RegisterTable;
+
+  InitApicId = GetInitialApicId ();
+  RegisterTable = NULL;
+  for (Index = 0; Index < AcpiCpuData->NumberOfCpus; Index++) {
+    if (RegisterTables[Index].InitialApicId == InitApicId) {
+      RegisterTable =  &RegisterTables[Index];
+      ProcIndex = Index;
+      break;
+    }
+  }
+  ASSERT (RegisterTable != NULL);
+
+  ProgramProcessorRegister (
+    RegisterTable,
+    (EFI_CPU_PHYSICAL_LOCATION *)(UINTN)AcpiCpuData->ApLocation + ProcIndex,
+    &AcpiCpuData->CpuStatus,
+    &CpuFeaturesData->CpuFlags
+    );
 }
 
 /**
@@ -746,6 +1103,9 @@ CpuFeaturesDetect (
 {
   UINTN                  NumberOfCpus;
   UINTN                  NumberOfEnabledProcessors;
+  CPU_FEATURES_DATA      *CpuFeaturesData;
+
+  CpuFeaturesData = GetCpuFeaturesData();
 
   GetNumberOfProcessor (&NumberOfCpus, &NumberOfEnabledProcessors);
 
@@ -754,49 +1114,13 @@ CpuFeaturesDetect (
   //
   // Wakeup all APs for data collection.
   //
-  StartupAPsWorker (CollectProcessorData);
+  StartupAPsWorker (CollectProcessorData, NULL);
 
   //
   // Collect data on BSP
   //
-  CollectProcessorData (NULL);
+  CollectProcessorData (CpuFeaturesData);
 
   AnalysisProcessorFeatures (NumberOfCpus);
 }
 
-/**
-  Performs CPU features Initialization.
-
-  This service will invoke MP service to perform CPU features
-  initialization on BSP/APs per user configuration.
-
-  @note This service could be called by BSP only.
-**/
-VOID
-EFIAPI
-CpuFeaturesInitialize (
-  VOID
-  )
-{
-  CPU_FEATURES_DATA      *CpuFeaturesData;
-  UINTN                  OldBspNumber;
-
-  CpuFeaturesData = GetCpuFeaturesData ();
-
-  OldBspNumber = GetProcessorIndex();
-  CpuFeaturesData->BspNumber = OldBspNumber;
-  //
-  // Wakeup all APs for programming.
-  //
-  StartupAPsWorker (SetProcessorRegister);
-  //
-  // Programming BSP
-  //
-  SetProcessorRegister (NULL);
-  //
-  // Switch to new BSP if required
-  //
-  if (CpuFeaturesData->BspNumber != OldBspNumber) {
-    SwitchNewBsp (CpuFeaturesData->BspNumber);
-  }
-}
diff --git 
a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.c 
b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.c
index 1f34a3f489..8346f7004f 100644
--- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.c
+++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.c
@@ -15,6 +15,7 @@
 #include <PiDxe.h>
 
 #include <Library/UefiBootServicesTableLib.h>
+#include <Library/UefiLib.h>
 
 #include "RegisterCpuFeatures.h"
 
@@ -115,14 +116,20 @@ GetProcessorInformation (
 
   @param[in]  Procedure               A pointer to the function to be run on
                                       enabled APs of the system.
+  @param[in]  MpEvent                 A pointer to the event to be used later
+                                      to check whether procedure has done.
 **/
 VOID
 StartupAPsWorker (
-  IN  EFI_AP_PROCEDURE                 Procedure
+  IN  EFI_AP_PROCEDURE                 Procedure,
+  IN  VOID                             *MpEvent
   )
 {
   EFI_STATUS                           Status;
   EFI_MP_SERVICES_PROTOCOL             *MpServices;
+  CPU_FEATURES_DATA                    *CpuFeaturesData;
+
+  CpuFeaturesData = GetCpuFeaturesData ();
 
   MpServices = GetMpProtocol ();
   //
@@ -132,9 +139,9 @@ StartupAPsWorker (
                  MpServices,
                  Procedure,
                  FALSE,
-                 NULL,
+                 (EFI_EVENT)MpEvent,
                  0,
-                 NULL,
+                 CpuFeaturesData,
                  NULL
                  );
   ASSERT_EFI_ERROR (Status);
@@ -197,3 +204,61 @@ GetNumberOfProcessor (
   ASSERT_EFI_ERROR (Status);
 }
 
+/**
+  Performs CPU features Initialization.
+
+  This service will invoke MP service to perform CPU features
+  initialization on BSP/APs per user configuration.
+
+  @note This service could be called by BSP only.
+**/
+VOID
+EFIAPI
+CpuFeaturesInitialize (
+  VOID
+  )
+{
+  CPU_FEATURES_DATA          *CpuFeaturesData;
+  UINTN                      OldBspNumber;
+  EFI_EVENT                  MpEvent;
+  EFI_STATUS                 Status;
+
+  CpuFeaturesData = GetCpuFeaturesData ();
+
+  OldBspNumber = GetProcessorIndex();
+  CpuFeaturesData->BspNumber = OldBspNumber;
+
+  Status = gBS->CreateEvent (
+                  EVT_NOTIFY_WAIT,
+                  TPL_CALLBACK,
+                  EfiEventEmptyFunction,
+                  NULL,
+                  &MpEvent
+                  );
+  ASSERT_EFI_ERROR (Status);
+
+  //
+  // Wakeup all APs for programming.
+  //
+  StartupAPsWorker (SetProcessorRegister, MpEvent);
+  //
+  // Programming BSP
+  //
+  SetProcessorRegister (CpuFeaturesData);
+
+  //
+  // Wait all processors to finish the task.
+  //
+  do {
+    Status = gBS->CheckEvent (MpEvent);
+  } while (Status == EFI_NOT_READY);
+  ASSERT_EFI_ERROR (Status);
+
+  //
+  // Switch to new BSP if required
+  //
+  if (CpuFeaturesData->BspNumber != OldBspNumber) {
+    SwitchNewBsp (CpuFeaturesData->BspNumber);
+  }
+}
+
diff --git 
a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.inf 
b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.inf
index f0f317c945..6693bae575 100644
--- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.inf
+++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/DxeRegisterCpuFeaturesLib.inf
@@ -47,6 +47,9 @@
   SynchronizationLib
   UefiBootServicesTableLib
   IoLib
+  UefiBootServicesTableLib
+  UefiLib
+  LocalApicLib
 
 [Protocols]
   gEfiMpServiceProtocolGuid                                            ## 
CONSUMES
diff --git 
a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.c 
b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.c
index 82fe268812..799864a136 100644
--- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.c
+++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.c
@@ -149,11 +149,15 @@ GetProcessorInformation (
 **/
 VOID
 StartupAPsWorker (
-  IN  EFI_AP_PROCEDURE                 Procedure
+  IN  EFI_AP_PROCEDURE                 Procedure,
+  IN  VOID                             *MpEvent
   )
 {
   EFI_STATUS                           Status;
   EFI_PEI_MP_SERVICES_PPI              *CpuMpPpi;
+  CPU_FEATURES_DATA                    *CpuFeaturesData;
+
+  CpuFeaturesData = GetCpuFeaturesData ();
 
   //
   // Get MP Services Protocol
@@ -175,7 +179,7 @@ StartupAPsWorker (
                  Procedure,
                  FALSE,
                  0,
-                 NULL
+                 CpuFeaturesData
                  );
   ASSERT_EFI_ERROR (Status);
 }
@@ -257,3 +261,50 @@ GetNumberOfProcessor (
                          );
   ASSERT_EFI_ERROR (Status);
 }
+
+/**
+  Performs CPU features Initialization.
+
+  This service will invoke MP service to perform CPU features
+  initialization on BSP/APs per user configuration.
+
+  @note This service could be called by BSP only.
+**/
+VOID
+EFIAPI
+CpuFeaturesInitialize (
+  VOID
+  )
+{
+  CPU_FEATURES_DATA          *CpuFeaturesData;
+  UINTN                      OldBspNumber;
+
+  CpuFeaturesData = GetCpuFeaturesData ();
+
+  OldBspNumber = GetProcessorIndex();
+  CpuFeaturesData->BspNumber = OldBspNumber;
+
+  //
+  // Known limitation: In PEI phase, CpuFeatures driver not
+  // support async mode execute tasks. So semaphore type
+  // register can't been used for this instance, must use
+  // DXE type instance.
+  //
+
+  //
+  // Wakeup all APs for programming.
+  //
+  StartupAPsWorker (SetProcessorRegister, NULL);
+  //
+  // Programming BSP
+  //
+  SetProcessorRegister (CpuFeaturesData);
+
+  //
+  // Switch to new BSP if required
+  //
+  if (CpuFeaturesData->BspNumber != OldBspNumber) {
+    SwitchNewBsp (CpuFeaturesData->BspNumber);
+  }
+}
+
diff --git 
a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.inf 
b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.inf
index fdfef98293..e95f01df0b 100644
--- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.inf
+++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/PeiRegisterCpuFeaturesLib.inf
@@ -49,6 +49,7 @@
   PeiServicesLib
   PeiServicesTablePointerLib
   IoLib
+  LocalApicLib
 
 [Ppis]
   gEfiPeiMpServicesPpiGuid                                             ## 
CONSUMES
diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeatures.h 
b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeatures.h
index edd266934f..ad970a00b2 100644
--- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeatures.h
+++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeatures.h
@@ -23,6 +23,7 @@
 #include <Library/MemoryAllocationLib.h>
 #include <Library/SynchronizationLib.h>
 #include <Library/IoLib.h>
+#include <Library/LocalApicLib.h>
 
 #include <AcpiCpuData.h>
 
@@ -46,16 +47,26 @@ typedef struct {
   CPU_FEATURE_INITIALIZE       InitializeFunc;
   UINT8                        *BeforeFeatureBitMask;
   UINT8                        *AfterFeatureBitMask;
+  UINT8                        *CoreBeforeFeatureBitMask;
+  UINT8                        *CoreAfterFeatureBitMask;
+  UINT8                        *PackageBeforeFeatureBitMask;
+  UINT8                        *PackageAfterFeatureBitMask;
   VOID                         *ConfigData;
   BOOLEAN                      BeforeAll;
   BOOLEAN                      AfterAll;
 } CPU_FEATURES_ENTRY;
 
+//
+// Flags used when program the register.
+//
+typedef struct {
+  volatile UINTN           MemoryMappedLock;     // Spinlock used to program 
mmio
+  volatile UINT32          *SemaphoreCount;      // Semaphore used to program 
semaphore.
+} PROGRAM_CPU_REGISTER_FLAGS;
+
 typedef struct {
   UINTN                    FeaturesCount;
   UINT32                   BitMaskSize;
-  SPIN_LOCK                MsrLock;
-  SPIN_LOCK                MemoryMappedLock;
   LIST_ENTRY               FeatureList;
 
   CPU_FEATURES_INIT_ORDER  *InitOrder;
@@ -64,9 +75,14 @@ typedef struct {
   UINT8                    *ConfigurationPcd;
   UINT8                    *SettingPcd;
 
+  UINT32                   NumberOfCpus;
+  ACPI_CPU_DATA            *AcpiCpuData;
+
   CPU_REGISTER_TABLE       *RegisterTable;
   CPU_REGISTER_TABLE       *PreSmmRegisterTable;
   UINTN                    BspNumber;
+
+  PROGRAM_CPU_REGISTER_FLAGS  CpuFlags;
 } CPU_FEATURES_DATA;
 
 #define CPU_FEATURE_ENTRY_FROM_LINK(a) \
@@ -118,10 +134,13 @@ GetProcessorInformation (
 
   @param[in]  Procedure               A pointer to the function to be run on
                                       enabled APs of the system.
+  @param[in]  MpEvent                 A pointer to the event to be used later
+                                      to check whether procedure has done.
 **/
 VOID
 StartupAPsWorker (
-  IN  EFI_AP_PROCEDURE                 Procedure
+  IN  EFI_AP_PROCEDURE                 Procedure,
+  IN  VOID                             *MpEvent
   );
 
 /**
@@ -170,4 +189,40 @@ DumpCpuFeature (
   IN CPU_FEATURES_ENTRY  *CpuFeature
   );
 
+/**
+  Return feature dependence result.
+
+  @param[in]  CpuFeature        Pointer to CPU feature.
+  @param[in]  Before            Check before dependence or after.
+
+  @retval     return the dependence result.
+**/
+CPU_FEATURE_DEPENDENCE_TYPE
+DetectFeatureScope (
+  IN CPU_FEATURES_ENTRY         *CpuFeature,
+  IN BOOLEAN                    Before
+  );
+
+/**
+  Programs registers for the calling processor.
+
+  @param[in,out] Buffer  The pointer to private data buffer.
+
+**/
+VOID
+EFIAPI
+SetProcessorRegister (
+  IN OUT VOID            *Buffer
+  );
+
+/**
+  Return ACPI_CPU_DATA data.
+
+  @return  Pointer to ACPI_CPU_DATA data.
+**/
+ACPI_CPU_DATA *
+GetAcpiCpuData (
+  VOID
+  );
+
 #endif
diff --git a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeaturesLib.c 
b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeaturesLib.c
index fa7e107e39..d395f98f32 100644
--- a/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeaturesLib.c
+++ b/UefiCpuPkg/Library/RegisterCpuFeaturesLib/RegisterCpuFeaturesLib.c
@@ -112,6 +112,302 @@ IsBitMaskMatchCheck (
   return FALSE;
 }
 
+/**
+  Return feature dependence result.
+
+  @param[in]  CpuFeature        Pointer to CPU feature.
+  @param[in]  Before            Check before dependence or after.
+
+  @retval     return the dependence result.
+**/
+CPU_FEATURE_DEPENDENCE_TYPE
+DetectFeatureScope (
+  IN CPU_FEATURES_ENTRY         *CpuFeature,
+  IN BOOLEAN                    Before
+  )
+{
+  if (Before) {
+    if (CpuFeature->PackageBeforeFeatureBitMask != NULL) {
+      return PackageDepType;
+    }
+
+    if (CpuFeature->CoreBeforeFeatureBitMask != NULL) {
+      return CoreDepType;
+    }
+
+    if (CpuFeature->BeforeFeatureBitMask != NULL) {
+      return ThreadDepType;
+    }
+
+    return NoneDepType;
+  }
+
+  if (CpuFeature->PackageAfterFeatureBitMask != NULL) {
+    return PackageDepType;
+  }
+
+  if (CpuFeature->CoreAfterFeatureBitMask != NULL) {
+    return CoreDepType;
+  }
+
+  if (CpuFeature->AfterFeatureBitMask != NULL) {
+    return ThreadDepType;
+  }
+
+  return NoneDepType;
+}
+
+/**
+  Clear dependence for the specified type.
+
+  @param[in]  CurrentFeature     Cpu feature need to clear.
+  @param[in]  Before             Before or after dependence relationship.
+
+**/
+VOID
+ClearFeatureScope (
+  IN CPU_FEATURES_ENTRY           *CpuFeature,
+  IN BOOLEAN                      Before
+  )
+{
+  if (Before) {
+    if (CpuFeature->BeforeFeatureBitMask != NULL) {
+      FreePool (CpuFeature->BeforeFeatureBitMask);
+      CpuFeature->BeforeFeatureBitMask = NULL;
+    }
+    if (CpuFeature->CoreBeforeFeatureBitMask != NULL) {
+      FreePool (CpuFeature->CoreBeforeFeatureBitMask);
+      CpuFeature->CoreBeforeFeatureBitMask = NULL;
+    }
+    if (CpuFeature->PackageBeforeFeatureBitMask != NULL) {
+      FreePool (CpuFeature->PackageBeforeFeatureBitMask);
+      CpuFeature->PackageBeforeFeatureBitMask = NULL;
+    }
+  } else {
+    if (CpuFeature->PackageAfterFeatureBitMask != NULL) {
+      FreePool (CpuFeature->PackageAfterFeatureBitMask);
+      CpuFeature->PackageAfterFeatureBitMask = NULL;
+    }
+    if (CpuFeature->CoreAfterFeatureBitMask != NULL) {
+      FreePool (CpuFeature->CoreAfterFeatureBitMask);
+      CpuFeature->CoreAfterFeatureBitMask = NULL;
+    }
+    if (CpuFeature->AfterFeatureBitMask != NULL) {
+      FreePool (CpuFeature->AfterFeatureBitMask);
+      CpuFeature->AfterFeatureBitMask = NULL;
+    }
+  }
+}
+
+/**
+  Base on dependence relationship to asjust feature dependence.
+
+  ONLY when the feature before(or after) the find feature also has 
+  dependence with the find feature. In this case, driver need to base
+  on dependce relationship to decide how to insert current feature and
+  adjust the feature dependence.
+
+  @param[in]  PreviousFeature    CPU feature current before the find one.
+  @param[in]  CurrentFeature     Cpu feature need to adjust.
+  @param[in]  Before             Before or after dependence relationship.
+
+  @retval   TRUE   means the current feature dependence has been adjusted.
+
+  @retval   FALSE  means the previous feature dependence has been adjusted.
+                   or previous feature has no dependence with the find one.
+
+**/
+BOOLEAN
+AdjustFeaturesDependence (
+  IN OUT CPU_FEATURES_ENTRY         *PreviousFeature,
+  IN OUT CPU_FEATURES_ENTRY         *CurrentFeature,
+  IN     BOOLEAN                    Before
+  )
+{
+  CPU_FEATURE_DEPENDENCE_TYPE            PreDependType;
+  CPU_FEATURE_DEPENDENCE_TYPE            CurrentDependType;
+
+  PreDependType     = DetectFeatureScope(PreviousFeature, Before);
+  CurrentDependType = DetectFeatureScope(CurrentFeature, Before);
+
+  //
+  // If previous feature has no dependence with the find featue.
+  // return FALSE.
+  //
+  if (PreDependType == NoneDepType) {
+    return FALSE;
+  }
+
+  //
+  // If both feature have dependence, keep the one which needs use more 
+  // processors and clear the dependence for the other one.
+  //
+  if (PreDependType >= CurrentDependType) {
+    ClearFeatureScope (CurrentFeature, Before);
+    return TRUE;
+  } else {
+    ClearFeatureScope (PreviousFeature, Before);
+    return FALSE;
+  }
+}
+
+/**
+  Base on dependence relationship to asjust feature order.
+
+  @param[in]  FeatureList        Pointer to CPU feature list
+  @param[in]  FindEntry          The entry this feature depend on.
+  @param[in]  CurrentEntry       The entry for this feature.
+  @param[in]  Before             Before or after dependence relationship.
+
+**/
+VOID
+AdjustEntry (
+  IN      LIST_ENTRY                *FeatureList,
+  IN OUT  LIST_ENTRY                *FindEntry,
+  IN OUT  LIST_ENTRY                *CurrentEntry,
+  IN      BOOLEAN                   Before
+  )
+{
+  LIST_ENTRY                *PreviousEntry;
+  CPU_FEATURES_ENTRY        *PreviousFeature;
+  CPU_FEATURES_ENTRY        *CurrentFeature;
+
+  //
+  // For CPU feature which has core or package type dependence, later code 
need to insert
+  // AcquireSpinLock/ReleaseSpinLock logic to sequency the execute order.
+  // So if driver finds both feature A and B need to execute before feature C, 
driver will
+  // base on dependence type of feature A and B to update the logic here.
+  // For example, feature A has package type dependence and feature B has core 
type dependence,
+  // because package type dependence need to wait for more processors which 
has strong dependence
+  // than core type dependence. So driver will adjust the feature order to B 
-> A -> C. and driver 
+  // will remove the feature dependence in feature B. 
+  // Driver just needs to make sure before feature C been executed, feature A 
has finished its task
+  // in all all thread. Feature A finished in all threads also means feature B 
have finshed in all
+  // threads.
+  //
+  if (Before) {
+    PreviousEntry = GetPreviousNode (FeatureList, FindEntry);
+  } else {

+    PreviousEntry = GetNextNode (FeatureList, FindEntry);
+  }
+
+  CurrentFeature  = CPU_FEATURE_ENTRY_FROM_LINK (CurrentEntry);
+  RemoveEntryList (CurrentEntry);
+
+  if (IsNull (FeatureList, PreviousEntry)) {
+    //
+    // If not exist the previous or next entry, just insert the current entry.
+    //
+    if (Before) {
+      InsertTailList (FindEntry, CurrentEntry);
+    } else {
+      InsertHeadList (FindEntry, CurrentEntry);
+    }
+  } else {
+    //
+    // If exist the previous or next entry, need to check it before insert 
curent entry.
+    //
+    PreviousFeature = CPU_FEATURE_ENTRY_FROM_LINK (PreviousEntry);
+
+    if (AdjustFeaturesDependence (PreviousFeature, CurrentFeature, Before)) {
+      //
+      // Return TRUE means current feature dependence has been cleared and the 
previous
+      // feature dependence has been kept and used. So insert current feature 
before (or after)
+      // the previous feature.
+      //
+      if (Before) {
+        InsertTailList (PreviousEntry, CurrentEntry);
+      } else {
+        InsertHeadList (PreviousEntry, CurrentEntry);
+      }
+    } else {
+      if (Before) {
+        InsertTailList (FindEntry, CurrentEntry);
+      } else {
+        InsertHeadList (FindEntry, CurrentEntry);
+      }
+    }
+  }
+}

+
+/**
+  Checks and adjusts current CPU features per dependency relationship.
+
+  @param[in]  FeatureList        Pointer to CPU feature list
+  @param[in]  CurrentEntry       Pointer to current checked CPU feature
+  @param[in]  FeatureMask        The feature bit mask.
+
+  @retval     return Swapped info.
+**/
+BOOLEAN
+InsertToBeforeEntry (
+  IN LIST_ENTRY              *FeatureList,
+  IN LIST_ENTRY              *CurrentEntry,
+  IN UINT8                   *FeatureMask
+  )
+{
+  LIST_ENTRY                 *CheckEntry;
+  CPU_FEATURES_ENTRY         *CheckFeature;
+  BOOLEAN                    Swapped;
+
+  Swapped = FALSE;
+
+  //
+  // Check all features dispatched before this entry
+  //
+  CheckEntry = GetFirstNode (FeatureList);
+  while (CheckEntry != CurrentEntry) {
+    CheckFeature = CPU_FEATURE_ENTRY_FROM_LINK (CheckEntry);
+    if (IsBitMaskMatchCheck (CheckFeature->FeatureMask, FeatureMask)) {
+      AdjustEntry (FeatureList, CheckEntry, CurrentEntry, TRUE);
+      Swapped = TRUE;
+      break;
+    }
+    CheckEntry = CheckEntry->ForwardLink;
+  }
+
+  return Swapped;
+}
+
+/**
+  Checks and adjusts current CPU features per dependency relationship.
+
+  @param[in]  FeatureList        Pointer to CPU feature list
+  @param[in]  CurrentEntry       Pointer to current checked CPU feature
+  @param[in]  FeatureMask        The feature bit mask.
+
+  @retval     return Swapped info.
+**/
+BOOLEAN
+InsertToAfterEntry (
+  IN LIST_ENTRY              *FeatureList,
+  IN LIST_ENTRY              *CurrentEntry,
+  IN UINT8                   *FeatureMask
+  )
+{
+  LIST_ENTRY                 *CheckEntry;
+  CPU_FEATURES_ENTRY         *CheckFeature;
+  BOOLEAN                    Swapped;
+
+  Swapped = FALSE;
+
+  //
+  // Check all features dispatched after this entry
+  //
+  CheckEntry = GetNextNode (FeatureList, CurrentEntry);
+  while (!IsNull (FeatureList, CheckEntry)) {
+    CheckFeature = CPU_FEATURE_ENTRY_FROM_LINK (CheckEntry);
+    if (IsBitMaskMatchCheck (CheckFeature->FeatureMask, FeatureMask)) {
+      AdjustEntry (FeatureList, CheckEntry, CurrentEntry, FALSE);
+      Swapped = TRUE;
+      break;
+    }
+    CheckEntry = CheckEntry->ForwardLink;
+  }
+
+  return Swapped;
+}
+
 /**
   Checks and adjusts CPU features order per dependency relationship.
 
@@ -128,11 +424,13 @@ CheckCpuFeaturesDependency (
   CPU_FEATURES_ENTRY         *CheckFeature;
   BOOLEAN                    Swapped;
   LIST_ENTRY                 *TempEntry;
+  LIST_ENTRY                 *NextEntry;
 
   CurrentEntry = GetFirstNode (FeatureList);
   while (!IsNull (FeatureList, CurrentEntry)) {
     Swapped = FALSE;
     CpuFeature = CPU_FEATURE_ENTRY_FROM_LINK (CurrentEntry);
+    NextEntry = CurrentEntry->ForwardLink;
     if (CpuFeature->BeforeAll) {
       //
       // Check all features dispatched before this entry
@@ -153,6 +451,7 @@ CheckCpuFeaturesDependency (
         CheckEntry = CheckEntry->ForwardLink;
       }
       if (Swapped) {
+        CurrentEntry = NextEntry;
         continue;
       }
     }
@@ -179,60 +478,59 @@ CheckCpuFeaturesDependency (
         CheckEntry = CheckEntry->ForwardLink;
       }
       if (Swapped) {
+        CurrentEntry = NextEntry;
         continue;
       }
     }
 
     if (CpuFeature->BeforeFeatureBitMask != NULL) {
-      //
-      // Check all features dispatched before this entry
-      //
-      CheckEntry = GetFirstNode (FeatureList);
-      while (CheckEntry != CurrentEntry) {
-        CheckFeature = CPU_FEATURE_ENTRY_FROM_LINK (CheckEntry);
-        if (IsBitMaskMatchCheck (CheckFeature->FeatureMask, 
CpuFeature->BeforeFeatureBitMask)) {
-          //
-          // If there is dependency, swap them
-          //
-          RemoveEntryList (CurrentEntry);
-          InsertTailList (CheckEntry, CurrentEntry);
-          Swapped = TRUE;
-          break;
-        }
-        CheckEntry = CheckEntry->ForwardLink;
-      }
+      Swapped = InsertToBeforeEntry (FeatureList, CurrentEntry, 
CpuFeature->BeforeFeatureBitMask);
       if (Swapped) {
+        CurrentEntry = NextEntry;
         continue;
       }
     }
 
     if (CpuFeature->AfterFeatureBitMask != NULL) {
-      //
-      // Check all features dispatched after this entry
-      //
-      CheckEntry = GetNextNode (FeatureList, CurrentEntry);
-      while (!IsNull (FeatureList, CheckEntry)) {
-        CheckFeature = CPU_FEATURE_ENTRY_FROM_LINK (CheckEntry);
-        if (IsBitMaskMatchCheck (CheckFeature->FeatureMask, 
CpuFeature->AfterFeatureBitMask)) {
-          //
-          // If there is dependency, swap them
-          //
-          TempEntry = GetNextNode (FeatureList, CurrentEntry);
-          RemoveEntryList (CurrentEntry);
-          InsertHeadList (CheckEntry, CurrentEntry);
-          CurrentEntry = TempEntry;
-          Swapped = TRUE;
-          break;
-        }
-        CheckEntry = CheckEntry->ForwardLink;
+      Swapped = InsertToAfterEntry (FeatureList, CurrentEntry, 
CpuFeature->AfterFeatureBitMask);
+      if (Swapped) {
+        CurrentEntry = NextEntry;
+        continue;
+      }
+    }
+
+    if (CpuFeature->CoreBeforeFeatureBitMask != NULL) {
+      Swapped = InsertToBeforeEntry (FeatureList, CurrentEntry, 
CpuFeature->CoreBeforeFeatureBitMask);
+      if (Swapped) {
+        CurrentEntry = NextEntry;
+        continue;
       }
+    }
+
+    if (CpuFeature->CoreAfterFeatureBitMask != NULL) {
+      Swapped = InsertToAfterEntry (FeatureList, CurrentEntry, 
CpuFeature->CoreAfterFeatureBitMask);
       if (Swapped) {
+        CurrentEntry = NextEntry;
         continue;
       }
     }
-    //
-    // No swap happened, check the next feature
-    //
+
+    if (CpuFeature->PackageBeforeFeatureBitMask != NULL) {
+      Swapped = InsertToBeforeEntry (FeatureList, CurrentEntry, 
CpuFeature->PackageBeforeFeatureBitMask);
+      if (Swapped) {
+        CurrentEntry = NextEntry;
+        continue;
+      }
+    }
+
+    if (CpuFeature->PackageAfterFeatureBitMask != NULL) {
+      Swapped = InsertToAfterEntry (FeatureList, CurrentEntry, 
CpuFeature->PackageAfterFeatureBitMask);
+      if (Swapped) {
+        CurrentEntry = NextEntry;
+        continue;
+      }
+    }
+
     CurrentEntry = CurrentEntry->ForwardLink;
   }
 }
@@ -265,8 +563,7 @@ RegisterCpuFeatureWorker (
   CpuFeaturesData = GetCpuFeaturesData ();
   if (CpuFeaturesData->FeaturesCount == 0) {
     InitializeListHead (&CpuFeaturesData->FeatureList);
-    InitializeSpinLock (&CpuFeaturesData->MsrLock);
-    InitializeSpinLock (&CpuFeaturesData->MemoryMappedLock);
+    InitializeSpinLock (&CpuFeaturesData->CpuFlags.MemoryMappedLock);
     CpuFeaturesData->BitMaskSize = (UINT32) BitMaskSize;
   }
   ASSERT (CpuFeaturesData->BitMaskSize == BitMaskSize);
@@ -328,6 +625,31 @@ RegisterCpuFeatureWorker (
       }
       CpuFeatureEntry->AfterFeatureBitMask = CpuFeature->AfterFeatureBitMask;
     }
+    if (CpuFeature->CoreBeforeFeatureBitMask != NULL) {
+      if (CpuFeatureEntry->CoreBeforeFeatureBitMask != NULL) {
+        FreePool (CpuFeatureEntry->CoreBeforeFeatureBitMask);
+      }
+      CpuFeatureEntry->CoreBeforeFeatureBitMask = 
CpuFeature->CoreBeforeFeatureBitMask;
+    }
+    if (CpuFeature->CoreAfterFeatureBitMask != NULL) {
+      if (CpuFeatureEntry->CoreAfterFeatureBitMask != NULL) {
+        FreePool (CpuFeatureEntry->CoreAfterFeatureBitMask);
+      }
+      CpuFeatureEntry->CoreAfterFeatureBitMask = 
CpuFeature->CoreAfterFeatureBitMask;
+    }
+    if (CpuFeature->PackageBeforeFeatureBitMask != NULL) {
+      if (CpuFeatureEntry->PackageBeforeFeatureBitMask != NULL) {
+        FreePool (CpuFeatureEntry->PackageBeforeFeatureBitMask);
+      }
+      CpuFeatureEntry->PackageBeforeFeatureBitMask = 
CpuFeature->PackageBeforeFeatureBitMask;
+    }
+    if (CpuFeature->PackageAfterFeatureBitMask != NULL) {
+      if (CpuFeatureEntry->PackageAfterFeatureBitMask != NULL) {
+        FreePool (CpuFeatureEntry->PackageAfterFeatureBitMask);
+      }
+      CpuFeatureEntry->PackageAfterFeatureBitMask = 
CpuFeature->PackageAfterFeatureBitMask;
+    }
+
     CpuFeatureEntry->BeforeAll = CpuFeature->BeforeAll;
     CpuFeatureEntry->AfterAll  = CpuFeature->AfterAll;
 
@@ -410,6 +732,8 @@ SetCpuFeaturesBitMask (
   @retval  RETURN_UNSUPPORTED       Registration of the CPU feature is not
                                     supported due to a circular dependency 
between
                                     BEFORE and AFTER features.
+  @retval  RETURN_NOT_READY         CPU feature PCD 
PcdCpuFeaturesUserConfiguration
+                                    not updated by Platform driver yet.
 
   @note This service could be called by BSP only.
 **/
@@ -431,12 +755,20 @@ RegisterCpuFeature (
   UINT8                      *FeatureMask;
   UINT8                      *BeforeFeatureBitMask;
   UINT8                      *AfterFeatureBitMask;
+  UINT8                      *CoreBeforeFeatureBitMask;
+  UINT8                      *CoreAfterFeatureBitMask;
+  UINT8                      *PackageBeforeFeatureBitMask;
+  UINT8                      *PackageAfterFeatureBitMask;
   BOOLEAN                    BeforeAll;
   BOOLEAN                    AfterAll;
 
-  FeatureMask          = NULL;
-  BeforeFeatureBitMask = NULL;
-  AfterFeatureBitMask  = NULL;
+  FeatureMask                 = NULL;
+  BeforeFeatureBitMask        = NULL;
+  AfterFeatureBitMask         = NULL;
+  CoreBeforeFeatureBitMask    = NULL;
+  CoreAfterFeatureBitMask     = NULL;
+  PackageBeforeFeatureBitMask  = NULL;
+  PackageAfterFeatureBitMask   = NULL;
   BeforeAll            = FALSE;
   AfterAll             = FALSE;
 
@@ -449,6 +781,10 @@ RegisterCpuFeature (
                     != (CPU_FEATURE_BEFORE | CPU_FEATURE_AFTER));
     ASSERT ((Feature & (CPU_FEATURE_BEFORE_ALL | CPU_FEATURE_AFTER_ALL))
                     != (CPU_FEATURE_BEFORE_ALL | CPU_FEATURE_AFTER_ALL));
+    ASSERT ((Feature & (CPU_FEATURE_CORE_BEFORE | CPU_FEATURE_CORE_AFTER))
+                    != (CPU_FEATURE_CORE_BEFORE | CPU_FEATURE_CORE_AFTER));
+    ASSERT ((Feature & (CPU_FEATURE_PACKAGE_BEFORE | 
CPU_FEATURE_PACKAGE_AFTER))
+                    != (CPU_FEATURE_PACKAGE_BEFORE | 
CPU_FEATURE_PACKAGE_AFTER));
     if (Feature < CPU_FEATURE_BEFORE) {
       BeforeAll = ((Feature & CPU_FEATURE_BEFORE_ALL) != 0) ? TRUE : FALSE;
       AfterAll  = ((Feature & CPU_FEATURE_AFTER_ALL) != 0) ? TRUE : FALSE;
@@ -459,6 +795,14 @@ RegisterCpuFeature (
       SetCpuFeaturesBitMask (&BeforeFeatureBitMask, Feature & 
~CPU_FEATURE_BEFORE, BitMaskSize);
     } else if ((Feature & CPU_FEATURE_AFTER) != 0) {
       SetCpuFeaturesBitMask (&AfterFeatureBitMask, Feature & 
~CPU_FEATURE_AFTER, BitMaskSize);
+    } else if ((Feature & CPU_FEATURE_CORE_BEFORE) != 0) {
+      SetCpuFeaturesBitMask (&CoreBeforeFeatureBitMask, Feature & 
~CPU_FEATURE_CORE_BEFORE, BitMaskSize);
+    } else if ((Feature & CPU_FEATURE_CORE_AFTER) != 0) {
+      SetCpuFeaturesBitMask (&CoreAfterFeatureBitMask, Feature & 
~CPU_FEATURE_CORE_AFTER, BitMaskSize);
+    } else if ((Feature & CPU_FEATURE_PACKAGE_BEFORE) != 0) {
+      SetCpuFeaturesBitMask (&PackageBeforeFeatureBitMask, Feature & 
~CPU_FEATURE_PACKAGE_BEFORE, BitMaskSize);
+    } else if ((Feature & CPU_FEATURE_PACKAGE_AFTER) != 0) {
+      SetCpuFeaturesBitMask (&PackageAfterFeatureBitMask, Feature & 
~CPU_FEATURE_PACKAGE_AFTER, BitMaskSize);
     }
     Feature = VA_ARG (Marker, UINT32);
   }
@@ -466,15 +810,19 @@ RegisterCpuFeature (
 
   CpuFeature = AllocateZeroPool (sizeof (CPU_FEATURES_ENTRY));
   ASSERT (CpuFeature != NULL);
-  CpuFeature->Signature            = CPU_FEATURE_ENTRY_SIGNATURE;
-  CpuFeature->FeatureMask          = FeatureMask;
-  CpuFeature->BeforeFeatureBitMask = BeforeFeatureBitMask;
-  CpuFeature->AfterFeatureBitMask  = AfterFeatureBitMask;
-  CpuFeature->BeforeAll            = BeforeAll;
-  CpuFeature->AfterAll             = AfterAll;
-  CpuFeature->GetConfigDataFunc    = GetConfigDataFunc;
-  CpuFeature->SupportFunc          = SupportFunc;
-  CpuFeature->InitializeFunc       = InitializeFunc;
+  CpuFeature->Signature                   = CPU_FEATURE_ENTRY_SIGNATURE;
+  CpuFeature->FeatureMask                 = FeatureMask;
+  CpuFeature->BeforeFeatureBitMask        = BeforeFeatureBitMask;
+  CpuFeature->AfterFeatureBitMask         = AfterFeatureBitMask;
+  CpuFeature->CoreBeforeFeatureBitMask    = CoreBeforeFeatureBitMask;
+  CpuFeature->CoreAfterFeatureBitMask     = CoreAfterFeatureBitMask;
+  CpuFeature->PackageBeforeFeatureBitMask = PackageBeforeFeatureBitMask;
+  CpuFeature->PackageAfterFeatureBitMask  = PackageAfterFeatureBitMask;
+  CpuFeature->BeforeAll                   = BeforeAll;
+  CpuFeature->AfterAll                    = AfterAll;
+  CpuFeature->GetConfigDataFunc           = GetConfigDataFunc;
+  CpuFeature->SupportFunc                 = SupportFunc;
+  CpuFeature->InitializeFunc              = InitializeFunc;
   if (FeatureName != NULL) {
     CpuFeature->FeatureName          = AllocatePool (CPU_FEATURE_NAME_SIZE);
     ASSERT (CpuFeature->FeatureName != NULL);
@@ -489,13 +837,12 @@ RegisterCpuFeature (
 }
 
 /**
-  Allocates boot service data to save ACPI_CPU_DATA.
+  Return ACPI_CPU_DATA data.
 
-  @return  Pointer to allocated ACPI_CPU_DATA.
+  @return  Pointer to ACPI_CPU_DATA data.
 **/
-STATIC
 ACPI_CPU_DATA *
-AllocateAcpiCpuData (
+GetAcpiCpuData (
   VOID
   )
 {
@@ -508,9 +855,20 @@ AllocateAcpiCpuData (
   UINTN                                Index;
   EFI_PROCESSOR_INFORMATION            ProcessorInfoBuffer;
 
+  AcpiCpuData = (ACPI_CPU_DATA *) (UINTN) PcdGet64 (PcdCpuS3DataAddress);
+  if (AcpiCpuData != NULL) {
+    return AcpiCpuData;
+  }
+
   AcpiCpuData  = AllocatePages (EFI_SIZE_TO_PAGES (sizeof (ACPI_CPU_DATA)));
   ASSERT (AcpiCpuData != NULL);
 
+  //
+  // Set PcdCpuS3DataAddress to the base address of the ACPI_CPU_DATA structure
+  //
+  Status = PcdSet64S (PcdCpuS3DataAddress, (UINT64)(UINTN)AcpiCpuData);
+  ASSERT_EFI_ERROR (Status);
+
   GetNumberOfProcessor (&NumberOfCpus, &NumberOfEnabledProcessors);
   AcpiCpuData->NumberOfCpus = (UINT32)NumberOfCpus;
 
@@ -606,7 +964,6 @@ CpuRegisterTableWriteWorker (
   IN UINT64                  Value
   )
 {
-  EFI_STATUS               Status;
   CPU_FEATURES_DATA        *CpuFeaturesData;
   ACPI_CPU_DATA            *AcpiCpuData;
   CPU_REGISTER_TABLE       *RegisterTable;
@@ -614,17 +971,8 @@ CpuRegisterTableWriteWorker (
 
   CpuFeaturesData = GetCpuFeaturesData ();
   if (CpuFeaturesData->RegisterTable == NULL) {
-    AcpiCpuData = (ACPI_CPU_DATA *) (UINTN) PcdGet64 (PcdCpuS3DataAddress);
-    if (AcpiCpuData == NULL) {
-      AcpiCpuData = AllocateAcpiCpuData ();
-      ASSERT (AcpiCpuData != NULL);
-      //
-      // Set PcdCpuS3DataAddress to the base address of the ACPI_CPU_DATA 
structure
-      //
-      Status = PcdSet64S (PcdCpuS3DataAddress, (UINT64)(UINTN)AcpiCpuData);
-      ASSERT_EFI_ERROR (Status);
-    }
-    ASSERT (AcpiCpuData->RegisterTable != 0);
+    AcpiCpuData = GetAcpiCpuData ();
+    ASSERT ((AcpiCpuData != NULL) && (AcpiCpuData->RegisterTable != 0));
     CpuFeaturesData->RegisterTable = (CPU_REGISTER_TABLE *) (UINTN) 
AcpiCpuData->RegisterTable;
     CpuFeaturesData->PreSmmRegisterTable = (CPU_REGISTER_TABLE *) (UINTN) 
AcpiCpuData->PreSmmInitRegisterTable;
   }
-- 
2.15.0.windows.1

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to