在 2017/12/29 下午8:48, Ingo Molnar 写道: > > * Jia Zhang <qianyue...@alibaba-inc.com> wrote: > >> >> >> 在 2017/12/28 下午8:24, Ingo Molnar 写道: >>> >>> * Jia Zhang <qianyue...@alibaba-inc.com> wrote: >>> >>>> Instead of blacklisting all types of Broadwell processor when running >>>> a late loading, only BDW-EP (signature 0x406f1, aka family 6, model 79, >>>> stepping 1) with the microcode version less than 0x0b000021 needs to >>>> be blacklisted. >>>> >>>> The erratum is documented in the the public documentation #334165 (See >>>> the item BDF90 for details). >>>> >>>> Signed-off-by: Jia Zhang <qianyue...@alibaba-inc.com> >>>> --- >>>> arch/x86/kernel/cpu/microcode/intel.c | 12 ++++++++++-- >>>> 1 file changed, 10 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/arch/x86/kernel/cpu/microcode/intel.c >>>> b/arch/x86/kernel/cpu/microcode/intel.c >>>> index 8ccdca6..79cad85 100644 >>>> --- a/arch/x86/kernel/cpu/microcode/intel.c >>>> +++ b/arch/x86/kernel/cpu/microcode/intel.c >>>> @@ -910,8 +910,16 @@ static bool is_blacklisted(unsigned int cpu) >>>> { >>>> struct cpuinfo_x86 *c = &cpu_data(cpu); >>>> >>>> - if (c->x86 == 6 && c->x86_model == INTEL_FAM6_BROADWELL_X) { >>>> - pr_err_once("late loading on model 79 is disabled.\n"); >>>> + /* >>>> + * The Broadwell-EP processor with the microcode version less >>>> + * then 0x0b000021 may result in system hang when running a late >>>> + * loading. This behavior is documented in item BDF90, #334165 >>>> + * (Intel Xeon Processor E7-8800/4800 v4 Product Family). >>>> + */ >>>> + if (c->x86 == 6 && c->x86_model == INTEL_FAM6_BROADWELL_X && >>>> + c->x86_mask == 0x01 && c->microcode < 0x0b000021) { >>>> + pr_err_once("late loading on cpu (sig 0x406f1) is disabled " >>>> + "due to erratum causing system hang.\n"); >>> >>> Please never break user-readable messages mid-sentence! >>> >>> This should be something like: >>> >>> pr_err_once("Late loading of the CPU microcode (sig 0x406f1) is >>> disabled due to Intel erratum BDF90 causing system hangs.\n"); >>> >>> (note the spelling and readability improvements as well) >>> >>> Btw., what does 'sig 0x406f1' refer to? >> >> It is so-called processor signature which can be used to identify a >> model of x86 processor uniquely. It's the return value of cpuid >> instruction with leaf 1(eax == 1). > > Ah, indeed, the (somewhat weird) encoding described in arch/x86/lib/cpu.c, > which > is essentially family+model+stepping encoded into a single integer, right?
Totally correct. > > That whole area needs a good cleanup to be less confusing (we refer to the > CPU > stepping as x86_stepping(), but the field is called ->x86_mask?), but in the Yes. This is a confusing name. I will send another patch to clean up it. > meanwhile, let's please make it more obvious in user facing message what's > happening. > > Instead of using the microcode signature of the CPU model, please write out > what's > going on: > > pr_err_once("Not loading old microcode version: erratum BDF90 on Intel > Broadwell-EP stepping 1 CPUs may cause system hangs.\n"); > > ... and please also tell the user what to do about it: > > pr_err_once("Please update your microcode files.\n"); Let me give a full background and we will have a best description for this erratum clearly. If current processor signature matches the problematic Broadwell-EP model (0x406f1) *AND* current version of microcode is less than 0x0b000021, launching a microcode update in Linux runtime (or so-called late loading) must be prohibited in order to prevent from system hang due to the erratum. Namely, the end user has to make a BIOS update to uprev the microcode. The code of microcode update loader in BIOS can safely issus an microcode update without the concern about this erratum. This is the so-called manner of early loading. Thanks, Jia > > !! > > Agreed? > > Thanks, > > Ingo >