On 2/23/16 6:37 AM, Borislav Petkov wrote:
On Tue, Feb 16, 2016 at 03:45:08PM -0600, Aravind Gopalakrishnan wrote:
  /* AMD-specific bits */
  #define MCI_STATUS_DEFERRED   (1ULL<<44)  /* declare an uncorrected error */
  #define MCI_STATUS_POISON     (1ULL<<43)  /* access poisonous data */
+#define MCI_STATUS_TCC         (1ULL<<55)  /* Task context corrupt */
\n


Ack.

+/*
+ * McaX field if set indicates a given bank supports MCA extensions:
+ *  - Deferred error interrupt type is specifiable by bank
+ *  - BlkPtr field indicates prescence of extended MISC registers.
                                ^^^^^^^^^

Btw, that's MCi_MISC[BlkPtr] ?

MCi_MISC0[BlkPtr] specifically. Will update the comments about this.

Also, please integrate a spell checker into your patch creation
workflow.

Sorry about that. Looks like this pair is not defined in spelling.txt. So, might be worth adding there as well?

+ *    But should not be used to determine MSR numbers
+ *  - TCC bit is present in MCx_STATUS
All sentences end with a "."

Will fix.


+/*
+ * Enumerating new IP types and HWID values
Please stop using gerund, i.e., -ing, in comments and commit messages.

"Enumerate new IP ..." is just fine.

Ack.


+ * in ScalableMCA enabled AMD processors
+ */
+#ifdef CONFIG_X86_MCE_AMD
+enum ip_types {
AMD-specific so "amd_ip_types"

Ok, will fix.


+       F17H_CORE = 0,  /* Core errors */
+       DF,             /* Data Fabric */
+       UMC,            /* Unified Memory Controller */
+       FUSE,           /* FUSE subsystem */
What's FUSE subsystem?

It's the block for programming FUSE registers.


In any case, this could use a different name in order not to confuse
with Linux's filesystem in userspace.

Ok, will ask internally as well as to what name suits here.

+
+struct hwid {
amd_hwid and so on. All below should have the "amd_" prefix so that it
is obvious.

Will fix.


+       const char *ipname;
+       unsigned int hwid_value;
+};
+
+extern struct hwid hwid_mappings[N_IP_TYPES];
+
+enum core_mcatypes {
+       LS = 0,         /* Load Store */
+       IF,             /* Instruction Fetch */
+       L2_CACHE,       /* L2 cache */
+       DE,             /* Decoder unit */
+       RES,            /* Reserved */
+       EX,             /* Execution unit */
+       FP,             /* Floating Point */
+       L3_CACHE        /* L3 cache */
+};
+
+enum df_mcatypes {
+       CS = 0,         /* Coherent Slave */
+       PIE             /* Power management, Interrupts, etc */
+};
+#endif
So all those are defined here but we have a header for exactly that
drivers/edac/mce_amd.h. And then you define and export hwid_mappings in
arch/x86/kernel/cpu/mcheck/mce_amd.c to not use it there.

Why isn't all this in drivers/edac/mce_amd.[ch] ?

And if it is there, then you obviously don't need the "amd_" prefix.

I have a patch that uses these enums here. But I didn't send it out along with this patchset as I was testing the patch.
So yes, I need it here and in the EDAC driver.


+
  #endif /* _ASM_X86_MCE_H */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 5523465..93bccbc 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -266,7 +266,9 @@
/* 'SMCA': AMD64 Scalable MCA */
  #define MSR_AMD64_SMCA_MC0_CONFIG     0xc0002004
+#define MSR_AMD64_SMCA_MC0_IPID                0xc0002005
  #define MSR_AMD64_SMCA_MCx_CONFIG(x)  (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
+#define MSR_AMD64_SMCA_MCx_IPID(x)     (MSR_AMD64_SMCA_MC0_IPID + 0x10*(x))
Are those MSRs used in multiple files? If not, -> mce.h.

Yes, I'll need them in arch/x86/.../mce_amd.c as well.
A later patch will be using it there.

+/* Defining HWID to IP type mappings for Scalable MCA */
" Define ..."

Ack


+       case L3_CACHE:
+               if (xec > (ARRAY_SIZE(f17h_l3_mce_desc) - 1))
+                       goto wrong_f17hcore_error;
+
+               pr_cont("%s.\n", f17h_l3_mce_desc[xec]);
+               break;
+
+       default:
+               goto wrong_f17hcore_error;
That's a lot of repeated code. You can assign the desc array to a temp
variable depending on mca_type and do the if and pr_cont only once using
that temp variable.

Ok, will simplify.


+
+       case PIE:
+               if (xec > (ARRAY_SIZE(f17h_pie_mce_desc) - 1))
+                       goto wrong_df_error;
+
+               pr_cont("%s.\n", f17h_pie_mce_desc[xec]);
+               break;
Ditto.


Will fix.

+
+/* Decode errors according to Scalable MCA specification */
+static void decode_smca_errors(struct mce *m)
+{
+       u32 low, high;
+       u32 addr = MSR_AMD64_SMCA_MCx_IPID(m->bank);
+       unsigned int hwid, mca_type, i;
+       u8 xec = XEC(m->status, xec_mask);
+
+       if (rdmsr_safe(addr, &low, &high)) {
+               pr_emerg("Unable to decode errors from banks\n");
That statement is not very useful in its current form.

How about "Unable to gather IP block that threw the error. Therefore cannot decode errors further.\n"

Or should I just remove the pr_emerg()?


+       for (i = 0; i < ARRAY_SIZE(hwid_mappings); i++)
So you use those hwid_mappings here. Why aren't they defined here too?

Same reason as the enums-
In a subsequent patch, I'll need it in arch/x86/../mce_amd.c
(I should have probably indicated this in the cover letter so you were aware of it. Sorry about that)


+       case SMU:
+               pr_emerg(HW_ERR "SMU Error: ");
+               if (xec > (ARRAY_SIZE(f17h_smu_mce_desc) - 1)) {
+                       pr_cont("Unrecognized error code from SMU MCA bank\n");
+                       return;
+               }
+               pr_cont("%s.\n", f17h_smu_mce_desc[xec]);
+               break;
Also a lot of repeated code which could be simplified.

Hmm, will try to simplify it in V2.


+ if (mce_flags.smca) {
ERROR: "mce_flags" [drivers/edac/edac_mce_amd.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2
make: *** Waiting for unfinished jobs....

Just read CPUID again here instead of exporting mce_flags.

Right. Will fix this.


+               u32 smca_low, smca_high;
+               u32 smca_addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank);
s/smca_//

Will fix.


Also, all those MSRs should be defined in drivers/edac/mce_amd.h if not
used outside.

It will be used in arch/x86/../mce_amd.c


+
+               if (!rdmsr_safe(smca_addr, &smca_low, &smca_high) &&
+                   (smca_low & MCI_CONFIG_MCAX))
+                       pr_cont("|%s",
No need for that break here.

Ok, Will fix.


+ case 0x17:
+               xec_mask = 0x3f;
+               if (!mce_flags.smca) {
+                       printk(KERN_WARNING "Decoding supported only on Scalable MCA 
enabled processors\n");
What is that supposed to do? I thought all new ones will have SMCA...


If for some reason the CPUID bit is not set, then we should not assume the processor supports the features right?

Thanks,
-Aravind.

Reply via email to