CpuDxe: Ignore extra APs in the system

Laszlo Ersek Thu, 29 Oct 2015 04:11:00 -0700

On 10/29/15 09:54, Fan, Jeff wrote:
> Jordan,
> 
> I think we could ASSERT() if the actually processors number is larger than 
> maximum number supported in system. 
> 
> Otherwise, send broadcast SMI or IPI may break the system.


- The base premise is that we don't know how many CPUs are in the
system. We can only count them by booting them all, and waiting for "a
while".

- We can stop waiting on three conditions:

  (a) We notice that the count stops incrementing after a while.
     This is very messy to implement because it requires timing the APs
     as they report in.

  (b) We stop waiting after a fixed interval elapses.
      This is the current status. It wastes time in some cases, and
      misses processors in other cases.

  (c) We stop waiting when we reach the supported maximum count.
      Implementing this is the main goal of the series.

- A direct consequence of (c) is that we won't know if there are *more*
processors than our supported maximum -- simply because we won't be
waiting and looking *longer* when we reach that count. "Waiting a bit
more" would immediately put us back to (a) or (b).

- This patch tries to cope with the unexpected, "excessive" VCPUs that
might escape us in method (c). It works as far as the MP services
implementation itself is conerned. However, as Jeff points out, not all
broadcast interrupts are required to be handled by ApEntryPointInC().
Case in point: the SMI handler is separate. Therefore I think this patch
provides a false sense of safety.

- Jeff's suggestion to ASSERT() when there are more CPUs than supported
is impossible to implement *on the BSP*. Again, that would require the
BSP to do (a) or (b), and that's expressly what we don't want to do when
the maximum supported count is reached.

- Jeff's suggestion can be implemented *on the APs*. An AP can notice
that it would bump the counter above the fixed limit. However, by that
time the BSP could be anywhere at all, and ASSERT()ing on the excessive
APs helps zilch -- the first such AP should stop *the BSP* wherever it is.

For the above reasons, I suggest:
- Either drop this patch (see "false sense of safety"),
- or change the AP logic in the initial counting routine to forcibly
  shut down the *entire* system, or at least halt the BSP in its
  tracks. (Maybe print an EFI_D_ERROR message too.)

I don't know how the second idea could be implemented (especially
without chipset knowledge, which is not appropriate for a CPU driver).
Personally I'd add a loud warning to the declaration of
PcdCpuMaxLogicalProcessorNumber, in the DEC file, and then drop this patch.

Consequently, physical platforms should set
PcdCpuMaxLogicalProcessorNumber to a fixed *large* number, and the
timeout to a fixed *low* number. Whereas virtual platforms should set
PcdCpuMaxLogicalProcessorNumber to the exact VCPU count dynamically, and
set the timeout to a fixed *very high* (= virtually infinite) number.

I suggest that this patch be dropped.

Thanks
Laszlo

> 
> Jeff
> -----Original Message-----
> From: edk2-devel [mailto:[email protected]] On Behalf Of Jordan 
> Justen
> Sent: Thursday, October 29, 2015 9:33 AM
> To: [email protected]
> Cc: Justen, Jordan L; Laszlo Ersek; Fan, Jeff
> Subject: [edk2] [PATCH 2/6] UefiCpuPkg/CpuDxe: Ignore extra APs in the system
> 
> The PcdCpuMaxLogicalProcessorNumber specifies the maximum number of logical 
> processors which are expected to be seen by the system. If more APs actually 
> are available in the system, we should prevent them from being used.
> 
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Jordan Justen <[email protected]>
> Cc: Jeff Fan <[email protected]>
> Cc: Laszlo Ersek <[email protected]>
> ---
>  UefiCpuPkg/CpuDxe/CpuMp.c | 22 +++++++++++++++++++++-
>  1 file changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/UefiCpuPkg/CpuDxe/CpuMp.c b/UefiCpuPkg/CpuDxe/CpuMp.c index 
> 98fdfdf..e80835f 100644
> --- a/UefiCpuPkg/CpuDxe/CpuMp.c
> +++ b/UefiCpuPkg/CpuDxe/CpuMp.c
> @@ -1196,6 +1196,17 @@ WhoAmI (
>      }
>    }
>  
> +  if (Index >= mMpSystemData.NumberOfProcessors) {
> +    //
> +    // This is not a valid error for the WhoAmI function, but it should never
> +    // happen from outside the driver. It could only happen if more APs
> +    // started than the PcdCpuMaxLogicalProcessorNumber was set to. This call
> +    // would come from ApEntryPointInC, and we use this error to prevent the
> +    // AP from being used by MP services.
> +    //
> +    return EFI_DEVICE_ERROR;
> +  }
> +
>    *ProcessorNumber = Index;
>    return EFI_SUCCESS;
>  }
> @@ -1446,10 +1457,15 @@ ApEntryPointInC (
>    VOID
>    )
>  {
> +  EFI_STATUS      Status;
>    VOID*           TopOfApStack;
>    UINTN           ProcessorNumber;
>  
>    if (!mAPsAlreadyInitFinished) {
> +    if (mMpSystemData.NumberOfProcessors >= gMaxLogicalProcessorNumber) {
> +      return;
> +    }
> +
>      FillInProcessorInformation (FALSE, mMpSystemData.NumberOfProcessors);
>      TopOfApStack  = (UINT8*)mApStackStart + gApStackSize;
>      mApStackStart = TopOfApStack;
> @@ -1461,7 +1477,11 @@ ApEntryPointInC (
>      mMpSystemData.NumberOfProcessors++;
>      mMpSystemData.NumberOfEnabledProcessors++;
>    } else {
> -    WhoAmI (&mMpServicesTemplate, &ProcessorNumber);
> +    Status = WhoAmI (&mMpServicesTemplate, &ProcessorNumber);
> +    if (EFI_ERROR (Status)) {
> +      return;
> +    }
> +
>      //
>      // Get the original stack address.
>      //
> --
> 2.5.1
> 
> _______________________________________________
> edk2-devel mailing list
> [email protected]
> https://lists.01.org/mailman/listinfo/edk2-devel
> _______________________________________________
> edk2-devel mailing list
> [email protected]
> https://lists.01.org/mailman/listinfo/edk2-devel
> 

_______________________________________________
edk2-devel mailing list
[email protected]
https://lists.01.org/mailman/listinfo/edk2-devel

Re: [edk2] [PATCH 2/6] UefiCpuPkg/CpuDxe: Ignore extra APs in the system

Reply via email to