We setup APIC vectors for threshold errors if interrupt_capable. However, we don't set interrupt_enable by default. Re-working threshold_restart_bank() here so that the first time we set up lvt_offset, we also set IntType to APIC.
User is still allowed to disable interrupts through sysfs. While at it, check if status is valid before we proceed to log error using mce_log. This is because, in multi-node platforms, only NBC has valid status info. So, the decoding of status values on the non-NBC leads to noise on kernel logs like so- [ 440.509744] EDAC DEBUG: amd64_inject_write_store: section=0x80000000 word_bits=0x10020001 [ 466.570925] [Hardware Error]: Corrected error, no action required. [ 466.570935] [Hardware Error]: CPU:25 (15:2:0) MC4_STATUS[-|CE|-|-|- [ 466.570936] [Hardware Error]: Corrected error, no action required. [ 466.570959] [Hardware Error]: CPU:26 (15:2:0) MC4_STATUS[-|CE|-|-|- <...> [ 466.571293] WARNING: CPU: 25 PID: 0 at drivers/edac/amd64_edac.c:2147 decode_bus_error+0x1ba/0x2a0() [ 466.571301] WARNING: CPU: 26 PID: 0 at drivers/edac/amd64_edac.c:2147 decode_bus_error+0x1ba/0x2a0() [ 466.571303] Something is rotten in the state of Denmark. Suggested-by: Borislav Petkov <b...@suse.de> Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com> --- Changes in V2: - earlier changes regarding removal of bank == 4 check and removal of 'interrupt_enable' attribute causes regressions. Fixed that. - moving setting of threshold_limit and comment style fixes are not directly related to this patch. So removing them to cut out any distractions - Add fix for garbled dmesg output on multi-node platforms, modify commit message to reflect the change. arch/x86/kernel/cpu/mcheck/mce_amd.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c index f1c3769..82c5144 100644 --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c @@ -250,6 +250,7 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c) if (!b.interrupt_capable) goto init; + b.interrupt_enable = 1; new = (high & MASK_LVTOFF_HI) >> 20; offset = setup_APIC_mce(offset, new); @@ -322,6 +323,8 @@ static void amd_threshold_interrupt(void) log: mce_setup(&m); rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status); + if (!(m.status & MCI_STATUS_VAL)) + return; m.misc = ((u64)high << 32) | low; m.bank = bank; mce_log(&m); @@ -497,10 +500,12 @@ static int allocate_threshold_blocks(unsigned int cpu, unsigned int bank, b->interrupt_capable = lvt_interrupt_supported(bank, high); b->threshold_limit = THRESHOLD_MAX; - if (b->interrupt_capable) + if (b->interrupt_capable) { threshold_ktype.default_attrs[2] = &interrupt_enable.attr; - else + b->interrupt_enable = 1; + } else { threshold_ktype.default_attrs[2] = NULL; + } INIT_LIST_HEAD(&b->miscj); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/