* Yazen Ghannam <yazen.ghan...@amd.com> wrote:
> From: Yazen Ghannam <yazen.ghan...@amd.com> > > The edac_mce_amd module calls decode_dram_ecc() on AMD Family17h and > later systems. This function is used in amd64_edac_mod to do > system-specific decoding for DRAM ECC errors. The function takes a > "NodeId" as a parameter. > > In AMD documentation, NodeId is used to identify a physical die in a > system. This can be used to identify a node in the AMD_NB code and also > it is used with umc_normaddr_to_sysaddr(). > > However, the input used for decode_dram_ecc() is currently the NUMA node > of a logical CPU. In the default configuration, the NUMA node and > physical die will be equivalent, so this doesn't have an impact. But the > NUMA node configuration can be adjusted with optional memory > interleaving schemes. This will cause the NUMA node enumeration to not > match the physical die enumeration. The mismatch will cause the address > translation function to fail or report incorrect results. > > Save the "NodeId" as a percpu value during init in AMD MCE code. Export > a function to return the value which can be used from modules like > edac_mce_amd. > > Fixes: fbe63acf62f5 ("EDAC, mce_amd: Use cpu_to_node() to find the node ID") > Signed-off-by: Yazen Ghannam <yazen.ghan...@amd.com> > --- > arch/x86/include/asm/mce.h | 2 ++ > arch/x86/kernel/cpu/mce/amd.c | 11 +++++++++++ > drivers/edac/mce_amd.c | 2 +- > 3 files changed, 14 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h > index cf503824529c..92527cc9ed06 100644 > --- a/arch/x86/include/asm/mce.h > +++ b/arch/x86/include/asm/mce.h > @@ -343,6 +343,8 @@ extern struct smca_bank smca_banks[MAX_NR_BANKS]; > extern const char *smca_get_long_name(enum smca_bank_types t); > extern bool amd_mce_is_memory_error(struct mce *m); > > +extern u8 amd_cpu_to_node(unsigned int cpu); > + > extern int mce_threshold_create_device(unsigned int cpu); > extern int mce_threshold_remove_device(unsigned int cpu); > > diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c > index 99be063fcb1b..524edf81e287 100644 > --- a/arch/x86/kernel/cpu/mce/amd.c > +++ b/arch/x86/kernel/cpu/mce/amd.c > @@ -202,6 +202,9 @@ static DEFINE_PER_CPU(unsigned int, bank_map); > /* Map of banks that have more than MCA_MISC0 available. */ > static DEFINE_PER_CPU(u32, smca_misc_banks_map); > > +/* CPUID_Fn8000001E_ECX[NodeId] used to identify a physical node/die. */ > +static DEFINE_PER_CPU(u8, node_id); > + > static void amd_threshold_interrupt(void); > static void amd_deferred_error_interrupt(void); > > @@ -233,6 +236,12 @@ static void smca_set_misc_banks_map(unsigned int bank, > unsigned int cpu) > > } > > +u8 amd_cpu_to_node(unsigned int cpu) > +{ > + return per_cpu(node_id, cpu); > +} > +EXPORT_SYMBOL_GPL(amd_cpu_to_node); > + > static void smca_configure(unsigned int bank, unsigned int cpu) > { > unsigned int i, hwid_mcatype; > @@ -240,6 +249,8 @@ static void smca_configure(unsigned int bank, unsigned > int cpu) > u32 high, low; > u32 smca_config = MSR_AMD64_SMCA_MCx_CONFIG(bank); > > + this_cpu_write(node_id, cpuid_ecx(0x8000001e) & 0xFF); So we already have this magic number used for a similar purpose, in amd_get_topology(): cpuid(0x8000001e, &eax, &ebx, &ecx, &edx); node_id = ecx & 0xff; Firstly, could we please at least give 0x8000001e a proper symbolic name, use it in hygon.c too (which AFAIK is derived from AMD anyway), and then use it in these new patches? Secondly, why not stick node_id into struct cpuinfo_x86, where the MCA code can then use it without having to introduce a new percpu data structure? There's also the underlying assumption that there's only ever going to be 256 nodes, which limitation I'm sure we'll hear about in a couple of years as not being quite enough. ;-) So less hardcoding and more generalizations please. Thanks, Ingo