We setup APIC vectors for threshold errors if interrupt_capable.
However, we don't set interrupt_enable by default.
Re-working threshold_restart_bank() here so that the first time we
set up lvt_offset, we also set IntType to APIC.

User is still allowed to disable interrupts through sysfs.

While at it, check if status is valid before we proceed to log
error using mce_log. This is because, in multi-node platforms,
only NBC has valid status info. So, the decoding of status values
on the non-NBC leads to noise on kernel logs like so-

[  440.509744] EDAC DEBUG: amd64_inject_write_store: section=0x80000000
word_bits=0x10020001
[  466.570925] [Hardware Error]: Corrected error, no action required.
[  466.570935] [Hardware Error]: CPU:25 (15:2:0) MC4_STATUS[-|CE|-|-|-
[  466.570936] [Hardware Error]: Corrected error, no action required.
[  466.570959] [Hardware Error]: CPU:26 (15:2:0) MC4_STATUS[-|CE|-|-|-
<...>
[  466.571293] WARNING: CPU: 25 PID: 0 at drivers/edac/amd64_edac.c:2147
decode_bus_error+0x1ba/0x2a0()
[  466.571301] WARNING: CPU: 26 PID: 0 at drivers/edac/amd64_edac.c:2147
decode_bus_error+0x1ba/0x2a0()
[  466.571303] Something is rotten in the state of Denmark.

Suggested-by: Borislav Petkov <b...@suse.de>
Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
---
Changes in V2:
 - earlier changes regarding removal of bank == 4 check and removal
   of 'interrupt_enable' attribute causes regressions. Fixed that.
 - moving setting of threshold_limit and comment style fixes are not
   directly related to this patch. So removing them to cut out any
   distractions
 - Add fix for garbled dmesg output on multi-node platforms, modify
   commit message to reflect the change.

 arch/x86/kernel/cpu/mcheck/mce_amd.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index f1c3769..82c5144 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -250,6 +250,7 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
                        if (!b.interrupt_capable)
                                goto init;
 
+                       b.interrupt_enable = 1;
                        new     = (high & MASK_LVTOFF_HI) >> 20;
                        offset  = setup_APIC_mce(offset, new);
 
@@ -322,6 +323,8 @@ static void amd_threshold_interrupt(void)
 log:
        mce_setup(&m);
        rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
+       if (!(m.status & MCI_STATUS_VAL))
+               return;
        m.misc = ((u64)high << 32) | low;
        m.bank = bank;
        mce_log(&m);
@@ -497,10 +500,12 @@ static int allocate_threshold_blocks(unsigned int cpu, 
unsigned int bank,
        b->interrupt_capable    = lvt_interrupt_supported(bank, high);
        b->threshold_limit      = THRESHOLD_MAX;
 
-       if (b->interrupt_capable)
+       if (b->interrupt_capable) {
                threshold_ktype.default_attrs[2] = &interrupt_enable.attr;
-       else
+               b->interrupt_enable = 1;
+       } else {
                threshold_ktype.default_attrs[2] = NULL;
+       }
 
        INIT_LIST_HEAD(&b->miscj);
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to