On Thu, Mar 21, 2019 at 08:25:18PM +0000, Ghannam, Yazen wrote:
> From: Yazen Ghannam <[email protected]>
> 
> AMD Family 17h Models 10h-2Fh may report a high number of L1 BTB MCA
> errors under certain conditions. The errors are benign and can safely be
> ignored. However, the high error rate may cause the MCA threshold
> counter to overflow causing a high rate of thresholding interrupts. In
> addition, users may see the errors reported through the AMD MCE decoder
> module, even with the interrupt disabled, due to MCA polling.
> 
> This error is reported through the Instruction Fetch bank.
> 
> Clear the "Counter Present" bit in the Instruction Fetch bank's
> MCA_MISC0 register. This will prevent enabling MCA thresholding on this
> bank which will prevent the high interrupt rate due to this error.
> 
> Define a function to filter these errors from the MCE event pool.
> Install this function during AMD vendor init. The MCA banks are enabled
> after vendor init, so the filter function will be installed before the
> spurious errors will be reported.
> 
> Cc: <[email protected]> # 4.14.x: c95b323dcd35: x86/MCE/AMD: Turn off 
> MC4_MISC thresholding on all family 0x15 models
> Cc: <[email protected]> # 4.14.x: 30aa3d26edb0: x86/MCE/AMD: Carve out 
> the MC4_MISC thresholding quirk
> Cc: <[email protected]> # 4.14.x
> Signed-off-by: Yazen Ghannam <[email protected]>
> ---
> Link:
> https://lkml.kernel.org/r/[email protected]
> 
> v1->v2:
> * Filter out the error earlier in MCE code rather than later in EDAC.
> 
>  arch/x86/kernel/cpu/mce/amd.c | 57 ++++++++++++++++++++++++++++-------
>  1 file changed, 46 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> index e64de5149e50..2db85f65b41e 100644
> --- a/arch/x86/kernel/cpu/mce/amd.c
> +++ b/arch/x86/kernel/cpu/mce/amd.c
> @@ -563,22 +563,55 @@ prepare_threshold_block(unsigned int bank, unsigned int 
> block, u32 addr,
>       return offset;
>  }
>  
> +bool filter_mce_rv(struct mce *m)
> +{
> +     enum smca_bank_types bank_type = smca_get_bank_type(m->bank);
> +     u8 xec = (m->status >> 16) & 0x3F;
> +
> +     /*
> +      * Spurious errors of this type may be reported.
> +      * See Family 17h Models 10h-2Fh Erratum #1114.
> +      */
> +     if (bank_type == SMCA_IF && xec == 10)
> +             return true;
> +
> +     return false;
> +}
> +
> +static void filter_mce_check(struct cpuinfo_x86 *c)
> +{
> +     if (c->x86 == 0x17 && (c->x86_model >= 0x10 && c->x86_model <= 0x2F))
> +             filter_mce = filter_mce_rv;
> +}

Why all the noodling here with a check function which assigns a
filter_mce_rv (btw, that "rv" means nothing outside of AMD) and a
generic default_filter_mce?

Why not a simple filter_mce() in mce/core.c which calls amd_filter_mce()
based on x86_vendor and amd_filter_mce() is defined in mce/amd.c?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Reply via email to