[PATCH v2 4/6] EDAC/amd64: Use cached data when checking for ECC

2019-10-22 Thread Ghannam, Yazen
From: Yazen Ghannam 

...now that the data is available earlier.

Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20191018153114.39378-5-yazen.ghan...@amd.com

v1 -> v2:
* No change.

rfc -> v1:
* No change.

 drivers/edac/amd64_edac.c | 20 
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 2d8129c8d183..6b6df53e8ae7 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3200,31 +3200,27 @@ static const char *ecc_msg =
"'ecc_enable_override'.\n"
" (Note that use of the override may cause unknown side effects.)\n";
 
-static bool ecc_enabled(struct pci_dev *F3, u16 nid)
+static bool ecc_enabled(struct amd64_pvt *pvt)
 {
+   u16 nid = pvt->mc_node_id;
bool nb_mce_en = false;
u8 ecc_en = 0, i;
u32 value;
 
if (boot_cpu_data.x86 >= 0x17) {
u8 umc_en_mask = 0, ecc_en_mask = 0;
+   struct amd64_umc *umc;
 
for_each_umc(i) {
-   u32 base = get_umc_base(i);
+   umc = >umc[i];
 
/* Only check enabled UMCs. */
-   if (amd_smn_read(nid, base + UMCCH_SDP_CTRL, ))
-   continue;
-
-   if (!(value & UMC_SDP_INIT))
+   if (!(umc->sdp_ctrl & UMC_SDP_INIT))
continue;
 
umc_en_mask |= BIT(i);
 
-   if (amd_smn_read(nid, base + UMCCH_UMC_CAP_HI, ))
-   continue;
-
-   if (value & UMC_ECC_ENABLED)
+   if (umc->umc_cap_hi & UMC_ECC_ENABLED)
ecc_en_mask |= BIT(i);
}
 
@@ -3237,7 +3233,7 @@ static bool ecc_enabled(struct pci_dev *F3, u16 nid)
/* Assume UMC MCA banks are enabled. */
nb_mce_en = true;
} else {
-   amd64_read_pci_cfg(F3, NBCFG, );
+   amd64_read_pci_cfg(pvt->F3, NBCFG, );
 
ecc_en = !!(value & NBCFG_ECC_ENABLE);
 
@@ -3522,7 +3518,7 @@ static int probe_one_instance(unsigned int nid)
if (ret < 0)
goto err_enable;
 
-   if (!ecc_enabled(F3, nid)) {
+   if (!ecc_enabled(pvt)) {
ret = 0;
 
if (!ecc_enable_override)
-- 
2.17.1



[PATCH v2 6/6] EDAC/amd64: Set grain per DIMM

2019-10-22 Thread Ghannam, Yazen
From: Yazen Ghannam 

The following commit introduced a warning on error reports without a
non-zero grain value.

  3724ace582d9 ("EDAC/mc: Fix grain_bits calculation")

The amd64_edac_mod module does not provide a value, so the warning will
be given on the first reported memory error.

Set the grain per DIMM to cacheline size (64 bytes). This is the current
recommendation.

Fixes: 3724ace582d9 ("EDAC/mc: Fix grain_bits calculation")
Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20191018153114.39378-7-yazen.ghan...@amd.com

v1 -> v2:
* No change.

rfc -> v1:
* New patch.

 drivers/edac/amd64_edac.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 114e7395daab..4ab7bcdede51 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2944,6 +2944,7 @@ static int init_csrows_df(struct mem_ctl_info *mci)
dimm->mtype = pvt->dram_type;
dimm->edac_mode = edac_mode;
dimm->dtype = dev_type;
+   dimm->grain = 64;
}
}
 
@@ -3020,6 +3021,7 @@ static int init_csrows(struct mem_ctl_info *mci)
dimm = csrow->channels[j]->dimm;
dimm->mtype = pvt->dram_type;
dimm->edac_mode = edac_mode;
+   dimm->grain = 64;
}
}
 
-- 
2.17.1



[PATCH v2 5/6] EDAC/amd64: Check for memory before fully initializing an instance

2019-10-22 Thread Ghannam, Yazen
From: Yazen Ghannam 

Return early before checking for ECC if the node does not have any
populated memory.

Free any cached hardware data before returning. Also, return 0 in this
case since this is not a failure. Other nodes may have memory and the
module should attempt to load an instance for them.

Move printing of hardware information to after the instance is
initialized, so that the information is only printed for nodes with
memory.

Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20191018153114.39378-6-yazen.ghan...@amd.com

v1 -> v2:
* No change.

rfc -> v1:
* Change message severity to "info".
  * Nodes without memory is a valid configuration. The user doesn't
need to be warned.
* Drop "DRAM ECC disabled" from message.
  * The message is given when no memory was detected on a node.
  * The state of DRAM ECC is not checked here.

 drivers/edac/amd64_edac.c | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 6b6df53e8ae7..114e7395daab 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2848,8 +2848,6 @@ static void read_mc_regs(struct amd64_pvt *pvt)
edac_dbg(1, "  DIMM type: %s\n", edac_mem_types[pvt->dram_type]);
 
determine_ecc_sym_sz(pvt);
-
-   dump_misc_regs(pvt);
 }
 
 /*
@@ -3489,6 +3487,19 @@ static int init_one_instance(struct amd64_pvt *pvt)
return 0;
 }
 
+static bool instance_has_memory(struct amd64_pvt *pvt)
+{
+   bool cs_enabled = false;
+   int cs = 0, dct = 0;
+
+   for (dct = 0; dct < fam_type->max_mcs; dct++) {
+   for_each_chip_select(cs, dct, pvt)
+   cs_enabled |= csrow_enabled(cs, dct, pvt);
+   }
+
+   return cs_enabled;
+}
+
 static int probe_one_instance(unsigned int nid)
 {
struct pci_dev *F3 = node_to_amd_nb(nid)->misc;
@@ -3518,6 +3529,12 @@ static int probe_one_instance(unsigned int nid)
if (ret < 0)
goto err_enable;
 
+   ret = 0;
+   if (!instance_has_memory(pvt)) {
+   amd64_info("Node %d: No DIMMs detected.\n", nid);
+   goto err_enable;
+   }
+
if (!ecc_enabled(pvt)) {
ret = 0;
 
@@ -3544,6 +3561,8 @@ static int probe_one_instance(unsigned int nid)
goto err_enable;
}
 
+   dump_misc_regs(pvt);
+
return ret;
 
 err_enable:
-- 
2.17.1



[PATCH v2 2/6] EDAC/amd64: Gather hardware information early

2019-10-22 Thread Ghannam, Yazen
From: Yazen Ghannam 

Split out gathering hardware information from init_one_instance() into a
separate function hw_info_get().

This is necessary so that the information can be cached earlier and used
to check if memory is populated and if ECC is enabled on a node.

Also, define a function hw_info_put() to back out changes made in
hw_info_get(). Currently, this includes two actions: freeing reserved
PCI device siblings and freeing the allocated struct amd64_umc.

Check for an allocated PCI device (Function 0 for Family 17h or Function
1 for pre-Family 17h) before freeing, since hw_info_put() may be called
before PCI siblings are reserved.

Drop the family check when freeing pvt->umc. This will be NULL on
pre-Family 17h systems. However, kfree() is safe and will check for a
NULL pointer before freeing.

Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20191018153114.39378-3-yazen.ghan...@amd.com

v1 -> v2:
* Change get_hardware_info() to hw_info_get().
* Add hw_info_put() to backout changes from hw_info_get().

rfc -> v1:
* Fixup after making struct amd64_family_type fam_type global.

 drivers/edac/amd64_edac.c | 101 +++---
 1 file changed, 51 insertions(+), 50 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index b9a712819c68..df7dd9604bb2 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3416,34 +3416,15 @@ static void compute_num_umcs(void)
edac_dbg(1, "Number of UMCs: %x", num_umcs);
 }
 
-static int init_one_instance(unsigned int nid)
+static int hw_info_get(struct amd64_pvt *pvt)
 {
-   struct pci_dev *F3 = node_to_amd_nb(nid)->misc;
-   struct mem_ctl_info *mci = NULL;
-   struct edac_mc_layer layers[2];
-   struct amd64_pvt *pvt = NULL;
u16 pci_id1, pci_id2;
-   int err = 0, ret;
-
-   ret = -ENOMEM;
-   pvt = kzalloc(sizeof(struct amd64_pvt), GFP_KERNEL);
-   if (!pvt)
-   goto err_ret;
-
-   pvt->mc_node_id = nid;
-   pvt->F3 = F3;
-
-   ret = -EINVAL;
-   fam_type = per_family_init(pvt);
-   if (!fam_type)
-   goto err_free;
+   int ret = -EINVAL;
 
if (pvt->fam >= 0x17) {
pvt->umc = kcalloc(num_umcs, sizeof(struct amd64_umc), 
GFP_KERNEL);
-   if (!pvt->umc) {
-   ret = -ENOMEM;
-   goto err_free;
-   }
+   if (!pvt->umc)
+   return -ENOMEM;
 
pci_id1 = fam_type->f0_id;
pci_id2 = fam_type->f6_id;
@@ -3452,21 +3433,37 @@ static int init_one_instance(unsigned int nid)
pci_id2 = fam_type->f2_id;
}
 
-   err = reserve_mc_sibling_devs(pvt, pci_id1, pci_id2);
-   if (err)
-   goto err_post_init;
+   ret = reserve_mc_sibling_devs(pvt, pci_id1, pci_id2);
+   if (ret)
+   return ret;
 
read_mc_regs(pvt);
 
+   return 0;
+}
+
+static void hw_info_put(struct amd64_pvt *pvt)
+{
+   if (pvt->F0 || pvt->F1)
+   free_mc_sibling_devs(pvt);
+
+   kfree(pvt->umc);
+}
+
+static int init_one_instance(struct amd64_pvt *pvt)
+{
+   struct mem_ctl_info *mci = NULL;
+   struct edac_mc_layer layers[2];
+   int ret = -EINVAL;
+
/*
 * We need to determine how many memory channels there are. Then use
 * that information for calculating the size of the dynamic instance
 * tables in the 'mci' structure.
 */
-   ret = -EINVAL;
pvt->channel_count = pvt->ops->early_channel_count(pvt);
if (pvt->channel_count < 0)
-   goto err_siblings;
+   return ret;
 
ret = -ENOMEM;
layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
@@ -3488,9 +3485,9 @@ static int init_one_instance(unsigned int nid)
layers[1].size = 2;
layers[1].is_virt_csrow = false;
 
-   mci = edac_mc_alloc(nid, ARRAY_SIZE(layers), layers, 0);
+   mci = edac_mc_alloc(pvt->mc_node_id, ARRAY_SIZE(layers), layers, 0);
if (!mci)
-   goto err_siblings;
+   return ret;
 
mci->pvt_info = pvt;
mci->pdev = >F3->dev;
@@ -3503,31 +3500,17 @@ static int init_one_instance(unsigned int nid)
ret = -ENODEV;
if (edac_mc_add_mc_with_groups(mci, amd64_edac_attr_groups)) {
edac_dbg(1, "failed edac_mc_add_mc()\n");
-   goto err_add_mc;
+   edac_mc_free(mci);
+   return ret;
}
 
return 0;
-
-err_add_mc:
-   edac_mc_free(mci);
-
-err_siblings:
-   free_mc_sibling_devs(pvt);
-
-err_post_init:
-   if (pvt->fam >= 0x17)
-   kfree(pvt->umc);
-
-err_free:
-   kfree(pvt);
-
-err_ret:
-   return ret;
 }
 
 static int probe_one_instance(unsigned int nid)
 {
struct pci_dev *F3 = node_to_amd_nb(nid)->misc;
+   struct amd64_pvt *pvt = NULL;
struct 

[PATCH v2 3/6] EDAC/amd64: Save max number of controllers to family type

2019-10-22 Thread Ghannam, Yazen
From: Yazen Ghannam 

The maximum number of memory controllers is fixed within a family/model
group. In most cases, this has been fixed at 2, but some systems may
have up to 8.

The struct amd64_family_type already contains family/model-specific
information, and this can be used rather than adding model checks to
various functions.

Create a new field in struct amd64_family_type for max_mcs.
Set this when setting other family type information, and use this when
needing the maximum number of memory controllers possible for a system.

Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20191018153114.39378-4-yazen.ghan...@amd.com

v1 -> v2:
* Change max_num_controllers to max_mcs.

rfc -> v1:
* New patch.
* Idea came up from Boris' comment about compute_num_umcs().

 drivers/edac/amd64_edac.c | 44 +--
 drivers/edac/amd64_edac.h |  2 ++
 2 files changed, 16 insertions(+), 30 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index df7dd9604bb2..2d8129c8d183 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -21,9 +21,6 @@ static struct amd64_family_type *fam_type;
 /* Per-node stuff */
 static struct ecc_settings **ecc_stngs;
 
-/* Number of Unified Memory Controllers */
-static u8 num_umcs;
-
 /*
  * Valid scrub rates for the K8 hardware memory scrubber. We map the scrubbing
  * bandwidth to a valid bit pattern. The 'set' operation finds the 'matching-
@@ -456,7 +453,7 @@ static void get_cs_base_and_mask(struct amd64_pvt *pvt, int 
csrow, u8 dct,
for (i = 0; i < pvt->csels[dct].m_cnt; i++)
 
 #define for_each_umc(i) \
-   for (i = 0; i < num_umcs; i++)
+   for (i = 0; i < fam_type->max_mcs; i++)
 
 /*
  * @input_addr is an InputAddr associated with the node given by mci. Return 
the
@@ -2226,6 +2223,7 @@ static struct amd64_family_type family_types[] = {
.ctl_name = "K8",
.f1_id = PCI_DEVICE_ID_AMD_K8_NB_ADDRMAP,
.f2_id = PCI_DEVICE_ID_AMD_K8_NB_MEMCTL,
+   .max_mcs = 2,
.ops = {
.early_channel_count= k8_early_channel_count,
.map_sysaddr_to_csrow   = k8_map_sysaddr_to_csrow,
@@ -2236,6 +2234,7 @@ static struct amd64_family_type family_types[] = {
.ctl_name = "F10h",
.f1_id = PCI_DEVICE_ID_AMD_10H_NB_MAP,
.f2_id = PCI_DEVICE_ID_AMD_10H_NB_DRAM,
+   .max_mcs = 2,
.ops = {
.early_channel_count= f1x_early_channel_count,
.map_sysaddr_to_csrow   = f1x_map_sysaddr_to_csrow,
@@ -2246,6 +2245,7 @@ static struct amd64_family_type family_types[] = {
.ctl_name = "F15h",
.f1_id = PCI_DEVICE_ID_AMD_15H_NB_F1,
.f2_id = PCI_DEVICE_ID_AMD_15H_NB_F2,
+   .max_mcs = 2,
.ops = {
.early_channel_count= f1x_early_channel_count,
.map_sysaddr_to_csrow   = f1x_map_sysaddr_to_csrow,
@@ -2256,6 +2256,7 @@ static struct amd64_family_type family_types[] = {
.ctl_name = "F15h_M30h",
.f1_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F1,
.f2_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F2,
+   .max_mcs = 2,
.ops = {
.early_channel_count= f1x_early_channel_count,
.map_sysaddr_to_csrow   = f1x_map_sysaddr_to_csrow,
@@ -2266,6 +2267,7 @@ static struct amd64_family_type family_types[] = {
.ctl_name = "F15h_M60h",
.f1_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F1,
.f2_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F2,
+   .max_mcs = 2,
.ops = {
.early_channel_count= f1x_early_channel_count,
.map_sysaddr_to_csrow   = f1x_map_sysaddr_to_csrow,
@@ -2276,6 +2278,7 @@ static struct amd64_family_type family_types[] = {
.ctl_name = "F16h",
.f1_id = PCI_DEVICE_ID_AMD_16H_NB_F1,
.f2_id = PCI_DEVICE_ID_AMD_16H_NB_F2,
+   .max_mcs = 2,
.ops = {
.early_channel_count= f1x_early_channel_count,
.map_sysaddr_to_csrow   = f1x_map_sysaddr_to_csrow,
@@ -2286,6 +2289,7 @@ static struct amd64_family_type family_types[] = {
.ctl_name = "F16h_M30h",
.f1_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F1,
.f2_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F2,
+   .max_mcs = 2,
.ops = {
.early_channel_count= f1x_early_channel_count,
.map_sysaddr_to_csrow   = f1x_map_sysaddr_to_csrow,
@@ -2296,6 +2300,7 @@ static struct amd64_family_type family_types[] = {
.ctl_name = "F17h",
.f0_id = 

[PATCH v2 0/6] AMD64 EDAC: Check for nodes without memory, etc.

2019-10-22 Thread Ghannam, Yazen
From: Yazen Ghannam 

Hi Boris,

Most of these patches address the issue where the module checks and
complains about DRAM ECC on nodes without memory.

Thanks,
Yazen

Link:
https://lkml.kernel.org/r/20191018153114.39378-1-yazen.ghan...@amd.com

Yazen Ghannam (6):
  EDAC/amd64: Make struct amd64_family_type global
  EDAC/amd64: Gather hardware information early
  EDAC/amd64: Save max number of controllers to family type
  EDAC/amd64: Use cached data when checking for ECC
  EDAC/amd64: Check for memory before fully initializing an instance
  EDAC/amd64: Set grain per DIMM

 drivers/edac/amd64_edac.c | 196 +++---
 drivers/edac/amd64_edac.h |   2 +
 2 files changed, 100 insertions(+), 98 deletions(-)

-- 
2.17.1



[PATCH v2 1/6] EDAC/amd64: Make struct amd64_family_type global

2019-10-22 Thread Ghannam, Yazen
From: Yazen Ghannam 

The struct amd64_family_type doesn't change between multiple nodes and
instances of the modules, so make it global.

Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20191018153114.39378-2-yazen.ghan...@amd.com

v1 -> v2:
* No change.

rfc -> v1:
* New patch based on suggestion from Boris.

 drivers/edac/amd64_edac.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index c1d4536ae466..b9a712819c68 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -16,6 +16,8 @@ module_param(ecc_enable_override, int, 0644);
 
 static struct msr __percpu *msrs;
 
+static struct amd64_family_type *fam_type;
+
 /* Per-node stuff */
 static struct ecc_settings **ecc_stngs;
 
@@ -3278,8 +3280,7 @@ f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, 
struct amd64_pvt *pvt)
}
 }
 
-static void setup_mci_misc_attrs(struct mem_ctl_info *mci,
-struct amd64_family_type *fam)
+static void setup_mci_misc_attrs(struct mem_ctl_info *mci)
 {
struct amd64_pvt *pvt = mci->pvt_info;
 
@@ -3298,7 +3299,7 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci,
 
mci->edac_cap   = determine_edac_cap(pvt);
mci->mod_name   = EDAC_MOD_STR;
-   mci->ctl_name   = fam->ctl_name;
+   mci->ctl_name   = fam_type->ctl_name;
mci->dev_name   = pci_name(pvt->F3);
mci->ctl_page_to_phys   = NULL;
 
@@ -3312,8 +3313,6 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci,
  */
 static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt)
 {
-   struct amd64_family_type *fam_type = NULL;
-
pvt->ext_model  = boot_cpu_data.x86_model >> 4;
pvt->stepping   = boot_cpu_data.x86_stepping;
pvt->model  = boot_cpu_data.x86_model;
@@ -3420,7 +3419,6 @@ static void compute_num_umcs(void)
 static int init_one_instance(unsigned int nid)
 {
struct pci_dev *F3 = node_to_amd_nb(nid)->misc;
-   struct amd64_family_type *fam_type = NULL;
struct mem_ctl_info *mci = NULL;
struct edac_mc_layer layers[2];
struct amd64_pvt *pvt = NULL;
@@ -3497,7 +3495,7 @@ static int init_one_instance(unsigned int nid)
mci->pvt_info = pvt;
mci->pdev = >F3->dev;
 
-   setup_mci_misc_attrs(mci, fam_type);
+   setup_mci_misc_attrs(mci);
 
if (init_csrows(mci))
mci->edac_cap = EDAC_FLAG_NONE;
-- 
2.17.1



RE: [PATCH 0/6] AMD64 EDAC: Check for nodes without memory, etc.

2019-10-21 Thread Ghannam, Yazen
> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org  
> On Behalf Of Borislav Petkov
> Sent: Monday, October 21, 2019 10:48 AM
> To: Ghannam, Yazen 
> Cc: linux-e...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 0/6] AMD64 EDAC: Check for nodes without memory, etc.
> 
> On Fri, Oct 18, 2019 at 03:31:25PM +, Ghannam, Yazen wrote:
> > From: Yazen Ghannam 
> >
> > Hi Boris,
> >
> > This set contains the next revision of the RFC patches I included with
> > the last AMD64 EDAC updates. I dropped the RFC tags, and I added a
> > couple of new patches.
> 
> Yah, looks pretty much good, modulo the minor things I commented on
> earlier.
> 

Thank you. I'll send another revision soon.

-Yazen


[PATCH 0/6] AMD64 EDAC: Check for nodes without memory, etc.

2019-10-18 Thread Ghannam, Yazen
From: Yazen Ghannam 

Hi Boris,

This set contains the next revision of the RFC patches I included with
the last AMD64 EDAC updates. I dropped the RFC tags, and I added a
couple of new patches.

Most of these patches address the issue where the module check and
complains about DRAM ECC on nodes without memory.

Patch 3 is new and came out of looking at the family type structs and
the boot flow.

Patch 6 fixes the "grain not set" warning that was recently introduced.

Thanks,
Yazen

Links:
https://lkml.kernel.org/r/20190821235938.118710-9-yazen.ghan...@amd.com
https://lkml.kernel.org/r/20190821235938.118710-10-yazen.ghan...@amd.com
https://lkml.kernel.org/r/20190821235938.118710-11-yazen.ghan...@amd.com

Yazen Ghannam (6):
  EDAC/amd64: Make struct amd64_family_type global
  EDAC/amd64: Gather hardware information early
  EDAC/amd64: Save max number of controllers to family type
  EDAC/amd64: Use cached data when checking for ECC
  EDAC/amd64: Check for memory before fully initializing an instance
  EDAC/amd64: Set grain per DIMM

 drivers/edac/amd64_edac.c | 174 --
 drivers/edac/amd64_edac.h |   1 +
 2 files changed, 94 insertions(+), 81 deletions(-)

-- 
2.17.1



[PATCH 5/6] EDAC/amd64: Check for memory before fully initializing an instance

2019-10-18 Thread Ghannam, Yazen
From: Yazen Ghannam 

Return early before checking for ECC if the node does not have any
populated memory.

Free any cached hardware data before returning. Also, return 0 in this
case since this is not a failure. Other nodes may have memory and the
module should attempt to load an instance for them.

Move printing of hardware information to after the instance is
initialized, so that the information is only printed for nodes with
memory.

Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190821235938.118710-11-yazen.ghan...@amd.com

rfc -> v1:
* Change message severity to "info".
  * Nodes without memory is a valid configuration. The user doesn't
need to be warned.
* Drop "DRAM ECC disabled" from message.
  * The message is given when no memory was detected on a node.
  * The state of DRAM ECC is not checked here.

 drivers/edac/amd64_edac.c | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index feb68c0a3217..2a0a8be8f767 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2848,8 +2848,6 @@ static void read_mc_regs(struct amd64_pvt *pvt)
edac_dbg(1, "  DIMM type: %s\n", edac_mem_types[pvt->dram_type]);
 
determine_ecc_sym_sz(pvt);
-
-   dump_misc_regs(pvt);
 }
 
 /*
@@ -3501,6 +3499,19 @@ static int init_one_instance(struct amd64_pvt *pvt)
return ret;
 }
 
+static bool instance_has_memory(struct amd64_pvt *pvt)
+{
+   bool cs_enabled = false;
+   int cs = 0, dct = 0;
+
+   for (dct = 0; dct < fam_type->max_num_controllers; dct++) {
+   for_each_chip_select(cs, dct, pvt)
+   cs_enabled |= csrow_enabled(cs, dct, pvt);
+   }
+
+   return cs_enabled;
+}
+
 static int probe_one_instance(unsigned int nid)
 {
struct pci_dev *F3 = node_to_amd_nb(nid)->misc;
@@ -3530,6 +3541,12 @@ static int probe_one_instance(unsigned int nid)
if (ret < 0)
goto err_enable;
 
+   ret = 0;
+   if (!instance_has_memory(pvt)) {
+   amd64_info("Node %d: No DIMMs detected.\n", nid);
+   goto err_enable;
+   }
+
if (!ecc_enabled(pvt)) {
ret = 0;
 
@@ -3556,6 +3573,8 @@ static int probe_one_instance(unsigned int nid)
goto err_enable;
}
 
+   dump_misc_regs(pvt);
+
return ret;
 
 err_enable:
-- 
2.17.1



[PATCH 4/6] EDAC/amd64: Use cached data when checking for ECC

2019-10-18 Thread Ghannam, Yazen
From: Yazen Ghannam 

...now that the data is available earlier.

Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190821235938.118710-10-yazen.ghan...@amd.com

rfc -> v1:
* No change.

 drivers/edac/amd64_edac.c | 20 
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 0fde5ad2fdcd..feb68c0a3217 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3200,31 +3200,27 @@ static const char *ecc_msg =
"'ecc_enable_override'.\n"
" (Note that use of the override may cause unknown side effects.)\n";
 
-static bool ecc_enabled(struct pci_dev *F3, u16 nid)
+static bool ecc_enabled(struct amd64_pvt *pvt)
 {
+   u16 nid = pvt->mc_node_id;
bool nb_mce_en = false;
u8 ecc_en = 0, i;
u32 value;
 
if (boot_cpu_data.x86 >= 0x17) {
u8 umc_en_mask = 0, ecc_en_mask = 0;
+   struct amd64_umc *umc;
 
for_each_umc(i) {
-   u32 base = get_umc_base(i);
+   umc = >umc[i];
 
/* Only check enabled UMCs. */
-   if (amd_smn_read(nid, base + UMCCH_SDP_CTRL, ))
-   continue;
-
-   if (!(value & UMC_SDP_INIT))
+   if (!(umc->sdp_ctrl & UMC_SDP_INIT))
continue;
 
umc_en_mask |= BIT(i);
 
-   if (amd_smn_read(nid, base + UMCCH_UMC_CAP_HI, ))
-   continue;
-
-   if (value & UMC_ECC_ENABLED)
+   if (umc->umc_cap_hi & UMC_ECC_ENABLED)
ecc_en_mask |= BIT(i);
}
 
@@ -3237,7 +3233,7 @@ static bool ecc_enabled(struct pci_dev *F3, u16 nid)
/* Assume UMC MCA banks are enabled. */
nb_mce_en = true;
} else {
-   amd64_read_pci_cfg(F3, NBCFG, );
+   amd64_read_pci_cfg(pvt->F3, NBCFG, );
 
ecc_en = !!(value & NBCFG_ECC_ENABLE);
 
@@ -3534,7 +3530,7 @@ static int probe_one_instance(unsigned int nid)
if (ret < 0)
goto err_enable;
 
-   if (!ecc_enabled(F3, nid)) {
+   if (!ecc_enabled(pvt)) {
ret = 0;
 
if (!ecc_enable_override)
-- 
2.17.1



[PATCH 1/6] EDAC/amd64: Make struct amd64_family_type global

2019-10-18 Thread Ghannam, Yazen
From: Yazen Ghannam 

The struct amd64_family_type doesn't change between multiple nodes and
instances of the modules, so make it global.

Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190821235938.118710-9-yazen.ghan...@amd.com

rfc -> v1:
* New patch based on suggestion from Boris.

 drivers/edac/amd64_edac.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index c1d4536ae466..b9a712819c68 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -16,6 +16,8 @@ module_param(ecc_enable_override, int, 0644);
 
 static struct msr __percpu *msrs;
 
+static struct amd64_family_type *fam_type;
+
 /* Per-node stuff */
 static struct ecc_settings **ecc_stngs;
 
@@ -3278,8 +3280,7 @@ f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, 
struct amd64_pvt *pvt)
}
 }
 
-static void setup_mci_misc_attrs(struct mem_ctl_info *mci,
-struct amd64_family_type *fam)
+static void setup_mci_misc_attrs(struct mem_ctl_info *mci)
 {
struct amd64_pvt *pvt = mci->pvt_info;
 
@@ -3298,7 +3299,7 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci,
 
mci->edac_cap   = determine_edac_cap(pvt);
mci->mod_name   = EDAC_MOD_STR;
-   mci->ctl_name   = fam->ctl_name;
+   mci->ctl_name   = fam_type->ctl_name;
mci->dev_name   = pci_name(pvt->F3);
mci->ctl_page_to_phys   = NULL;
 
@@ -3312,8 +3313,6 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci,
  */
 static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt)
 {
-   struct amd64_family_type *fam_type = NULL;
-
pvt->ext_model  = boot_cpu_data.x86_model >> 4;
pvt->stepping   = boot_cpu_data.x86_stepping;
pvt->model  = boot_cpu_data.x86_model;
@@ -3420,7 +3419,6 @@ static void compute_num_umcs(void)
 static int init_one_instance(unsigned int nid)
 {
struct pci_dev *F3 = node_to_amd_nb(nid)->misc;
-   struct amd64_family_type *fam_type = NULL;
struct mem_ctl_info *mci = NULL;
struct edac_mc_layer layers[2];
struct amd64_pvt *pvt = NULL;
@@ -3497,7 +3495,7 @@ static int init_one_instance(unsigned int nid)
mci->pvt_info = pvt;
mci->pdev = >F3->dev;
 
-   setup_mci_misc_attrs(mci, fam_type);
+   setup_mci_misc_attrs(mci);
 
if (init_csrows(mci))
mci->edac_cap = EDAC_FLAG_NONE;
-- 
2.17.1



[PATCH 6/6] EDAC/amd64: Set grain per DIMM

2019-10-18 Thread Ghannam, Yazen
From: Yazen Ghannam 

The following commit introduced a warning on error reports without a
non-zero grain value.

  3724ace582d9 ("EDAC/mc: Fix grain_bits calculation")

The amd64_edac_mod module does not provide a value, so the warning will
be given on the first reported memory error.

Set the grain per DIMM to cacheline size (64 bytes). This is the current
recommendation.

Fixes: 3724ace582d9 ("EDAC/mc: Fix grain_bits calculation")
Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/724d6f97-61f2-94bd-3f4b-793a55b6a...@amd.com

rfc -> v1:
* New patch.

 drivers/edac/amd64_edac.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 2a0a8be8f767..b5c0accfefcf 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2944,6 +2944,7 @@ static int init_csrows_df(struct mem_ctl_info *mci)
dimm->mtype = pvt->dram_type;
dimm->edac_mode = edac_mode;
dimm->dtype = dev_type;
+   dimm->grain = 64;
}
}
 
@@ -3020,6 +3021,7 @@ static int init_csrows(struct mem_ctl_info *mci)
dimm = csrow->channels[j]->dimm;
dimm->mtype = pvt->dram_type;
dimm->edac_mode = edac_mode;
+   dimm->grain = 64;
}
}
 
-- 
2.17.1



[PATCH 3/6] EDAC/amd64: Save max number of controllers to family type

2019-10-18 Thread Ghannam, Yazen
From: Yazen Ghannam 

The maximum number of memory controllers is fixed within a family/model
group. In most cases, this has been fixed at 2, but some systems may
have up to 8.

The struct amd64_family_type already contains family/model-specific
information, and this can be used rather than adding model checks to
various functions.

Create a new field in struct amd64_family_type for max_num_controllers.
Set this when setting other family type information, and use this when
needing the maximum number of memory controllers possible for a system.

Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190821235938.118710-9-yazen.ghan...@amd.com

rfc -> v1:
* New patch.
* Idea came up from Boris' comment about compute_num_umcs().

 drivers/edac/amd64_edac.c | 45 +--
 drivers/edac/amd64_edac.h |  1 +
 2 files changed, 16 insertions(+), 30 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 4410da7c3a25..0fde5ad2fdcd 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -21,9 +21,6 @@ static struct amd64_family_type *fam_type;
 /* Per-node stuff */
 static struct ecc_settings **ecc_stngs;
 
-/* Number of Unified Memory Controllers */
-static u8 num_umcs;
-
 /*
  * Valid scrub rates for the K8 hardware memory scrubber. We map the scrubbing
  * bandwidth to a valid bit pattern. The 'set' operation finds the 'matching-
@@ -456,7 +453,7 @@ static void get_cs_base_and_mask(struct amd64_pvt *pvt, int 
csrow, u8 dct,
for (i = 0; i < pvt->csels[dct].m_cnt; i++)
 
 #define for_each_umc(i) \
-   for (i = 0; i < num_umcs; i++)
+   for (i = 0; i < fam_type->max_num_controllers; i++)
 
 /*
  * @input_addr is an InputAddr associated with the node given by mci. Return 
the
@@ -2226,6 +2223,7 @@ static struct amd64_family_type family_types[] = {
.ctl_name = "K8",
.f1_id = PCI_DEVICE_ID_AMD_K8_NB_ADDRMAP,
.f2_id = PCI_DEVICE_ID_AMD_K8_NB_MEMCTL,
+   .max_num_controllers = 2,
.ops = {
.early_channel_count= k8_early_channel_count,
.map_sysaddr_to_csrow   = k8_map_sysaddr_to_csrow,
@@ -2236,6 +2234,7 @@ static struct amd64_family_type family_types[] = {
.ctl_name = "F10h",
.f1_id = PCI_DEVICE_ID_AMD_10H_NB_MAP,
.f2_id = PCI_DEVICE_ID_AMD_10H_NB_DRAM,
+   .max_num_controllers = 2,
.ops = {
.early_channel_count= f1x_early_channel_count,
.map_sysaddr_to_csrow   = f1x_map_sysaddr_to_csrow,
@@ -2246,6 +2245,7 @@ static struct amd64_family_type family_types[] = {
.ctl_name = "F15h",
.f1_id = PCI_DEVICE_ID_AMD_15H_NB_F1,
.f2_id = PCI_DEVICE_ID_AMD_15H_NB_F2,
+   .max_num_controllers = 2,
.ops = {
.early_channel_count= f1x_early_channel_count,
.map_sysaddr_to_csrow   = f1x_map_sysaddr_to_csrow,
@@ -2256,6 +2256,7 @@ static struct amd64_family_type family_types[] = {
.ctl_name = "F15h_M30h",
.f1_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F1,
.f2_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F2,
+   .max_num_controllers = 2,
.ops = {
.early_channel_count= f1x_early_channel_count,
.map_sysaddr_to_csrow   = f1x_map_sysaddr_to_csrow,
@@ -2266,6 +2267,7 @@ static struct amd64_family_type family_types[] = {
.ctl_name = "F15h_M60h",
.f1_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F1,
.f2_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F2,
+   .max_num_controllers = 2,
.ops = {
.early_channel_count= f1x_early_channel_count,
.map_sysaddr_to_csrow   = f1x_map_sysaddr_to_csrow,
@@ -2276,6 +2278,7 @@ static struct amd64_family_type family_types[] = {
.ctl_name = "F16h",
.f1_id = PCI_DEVICE_ID_AMD_16H_NB_F1,
.f2_id = PCI_DEVICE_ID_AMD_16H_NB_F2,
+   .max_num_controllers = 2,
.ops = {
.early_channel_count= f1x_early_channel_count,
.map_sysaddr_to_csrow   = f1x_map_sysaddr_to_csrow,
@@ -2286,6 +2289,7 @@ static struct amd64_family_type family_types[] = {
.ctl_name = "F16h_M30h",
.f1_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F1,
.f2_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F2,
+   .max_num_controllers = 2,
.ops = {
.early_channel_count= f1x_early_channel_count,
.map_sysaddr_to_csrow   = f1x_map_sysaddr_to_csrow,
@@ -2296,6 +2300,7 @@ static struct amd64_family_type family_types[] = {
.ctl_name = 

[PATCH 2/6] EDAC/amd64: Gather hardware information early

2019-10-18 Thread Ghannam, Yazen
From: Yazen Ghannam 

Split out gathering hardware information from init_one_instance() into a
separate function get_hardware_info().

This is necessary so that the information can be cached earlier and used
to check if memory is populated and if ECC is enabled on a node.

Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190821235938.118710-9-yazen.ghan...@amd.com

rfc -> v1:
* Fixup after making struct amd64_family_type fam_type global.

 drivers/edac/amd64_edac.c | 72 +++
 1 file changed, 42 insertions(+), 30 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index b9a712819c68..4410da7c3a25 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3416,33 +3416,16 @@ static void compute_num_umcs(void)
edac_dbg(1, "Number of UMCs: %x", num_umcs);
 }
 
-static int init_one_instance(unsigned int nid)
+static int get_hardware_info(struct amd64_pvt *pvt)
 {
-   struct pci_dev *F3 = node_to_amd_nb(nid)->misc;
-   struct mem_ctl_info *mci = NULL;
-   struct edac_mc_layer layers[2];
-   struct amd64_pvt *pvt = NULL;
u16 pci_id1, pci_id2;
-   int err = 0, ret;
-
-   ret = -ENOMEM;
-   pvt = kzalloc(sizeof(struct amd64_pvt), GFP_KERNEL);
-   if (!pvt)
-   goto err_ret;
-
-   pvt->mc_node_id = nid;
-   pvt->F3 = F3;
-
-   ret = -EINVAL;
-   fam_type = per_family_init(pvt);
-   if (!fam_type)
-   goto err_free;
+   int ret = -EINVAL;
 
if (pvt->fam >= 0x17) {
pvt->umc = kcalloc(num_umcs, sizeof(struct amd64_umc), 
GFP_KERNEL);
if (!pvt->umc) {
ret = -ENOMEM;
-   goto err_free;
+   goto err_ret;
}
 
pci_id1 = fam_type->f0_id;
@@ -3452,18 +3435,33 @@ static int init_one_instance(unsigned int nid)
pci_id2 = fam_type->f2_id;
}
 
-   err = reserve_mc_sibling_devs(pvt, pci_id1, pci_id2);
-   if (err)
+   ret = reserve_mc_sibling_devs(pvt, pci_id1, pci_id2);
+   if (ret)
goto err_post_init;
 
read_mc_regs(pvt);
 
+   return 0;
+
+err_post_init:
+   if (pvt->fam >= 0x17)
+   kfree(pvt->umc);
+
+err_ret:
+   return ret;
+}
+
+static int init_one_instance(struct amd64_pvt *pvt)
+{
+   struct mem_ctl_info *mci = NULL;
+   struct edac_mc_layer layers[2];
+   int ret = -EINVAL;
+
/*
 * We need to determine how many memory channels there are. Then use
 * that information for calculating the size of the dynamic instance
 * tables in the 'mci' structure.
 */
-   ret = -EINVAL;
pvt->channel_count = pvt->ops->early_channel_count(pvt);
if (pvt->channel_count < 0)
goto err_siblings;
@@ -3488,7 +3486,7 @@ static int init_one_instance(unsigned int nid)
layers[1].size = 2;
layers[1].is_virt_csrow = false;
 
-   mci = edac_mc_alloc(nid, ARRAY_SIZE(layers), layers, 0);
+   mci = edac_mc_alloc(pvt->mc_node_id, ARRAY_SIZE(layers), layers, 0);
if (!mci)
goto err_siblings;
 
@@ -3514,20 +3512,16 @@ static int init_one_instance(unsigned int nid)
 err_siblings:
free_mc_sibling_devs(pvt);
 
-err_post_init:
if (pvt->fam >= 0x17)
kfree(pvt->umc);
 
-err_free:
-   kfree(pvt);
-
-err_ret:
return ret;
 }
 
 static int probe_one_instance(unsigned int nid)
 {
struct pci_dev *F3 = node_to_amd_nb(nid)->misc;
+   struct amd64_pvt *pvt = NULL;
struct ecc_settings *s;
int ret;
 
@@ -3538,6 +3532,21 @@ static int probe_one_instance(unsigned int nid)
 
ecc_stngs[nid] = s;
 
+   pvt = kzalloc(sizeof(struct amd64_pvt), GFP_KERNEL);
+   if (!pvt)
+   goto err_settings;
+
+   pvt->mc_node_id = nid;
+   pvt->F3 = F3;
+
+   fam_type = per_family_init(pvt);
+   if (!fam_type)
+   goto err_enable;
+
+   ret = get_hardware_info(pvt);
+   if (ret < 0)
+   goto err_enable;
+
if (!ecc_enabled(F3, nid)) {
ret = 0;
 
@@ -3554,7 +3563,7 @@ static int probe_one_instance(unsigned int nid)
goto err_enable;
}
 
-   ret = init_one_instance(nid);
+   ret = init_one_instance(pvt);
if (ret < 0) {
amd64_err("Error probing instance: %d\n", nid);
 
@@ -3567,6 +3576,9 @@ static int probe_one_instance(unsigned int nid)
return ret;
 
 err_enable:
+   kfree(pvt);
+
+err_settings:
kfree(s);
ecc_stngs[nid] = NULL;
 
-- 
2.17.1



RE: [RFC PATCH v3 08/10] EDAC/amd64: Gather hardware information early

2019-09-06 Thread Ghannam, Yazen
> -Original Message-
> From: linux-edac-ow...@vger.kernel.org  On 
> Behalf Of Borislav Petkov
> Sent: Friday, September 6, 2019 3:35 PM
> To: Ghannam, Yazen 
> Cc: linux-e...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [RFC PATCH v3 08/10] EDAC/amd64: Gather hardware information 
> early
> 
> On Fri, Sep 06, 2019 at 07:14:57PM +, Ghannam, Yazen wrote:
> > This struct is used per channel, so we may have 2-8 per system.
> 
> Ah, true.
> 
> > We could fix it at the max (8). What do you think?
> 
> Anything in struct amd64_umc that is shared between those channels or
> all max 8 of them can be distinct?
> 


All the fields are register values, and there are unique instances for each 
channel. They can
potentially all be different.

Thanks,
Yazen


RE: [RFC PATCH v3 08/10] EDAC/amd64: Gather hardware information early

2019-09-06 Thread Ghannam, Yazen
> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org  
> On Behalf Of Borislav Petkov
> Sent: Thursday, August 29, 2019 4:23 AM
> To: Ghannam, Yazen 
> Cc: linux-e...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [RFC PATCH v3 08/10] EDAC/amd64: Gather hardware information 
> early
> 
> On Thu, Aug 22, 2019 at 12:00:02AM +, Ghannam, Yazen wrote:
> > From: Yazen Ghannam 
> >
> > Split out gathering hardware information from init_one_instance() into a
> > separate function get_hardware_info().
> >
> > This is necessary so that the information can be cached earlier and used
> > to check if memory is populated and if ECC is enabled on a node.
> >
> > Signed-off-by: Yazen Ghannam 
> > ---
> >  drivers/edac/amd64_edac.c | 76 +++
> >  1 file changed, 45 insertions(+), 31 deletions(-)
> >
> > diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> > index 4d1e6daa7ec4..84832771dec0 100644
> > --- a/drivers/edac/amd64_edac.c
> > +++ b/drivers/edac/amd64_edac.c
> > @@ -3405,34 +3405,17 @@ static void compute_num_umcs(void)
> > edac_dbg(1, "Number of UMCs: %x", num_umcs);
> >  }
> >
> > -static int init_one_instance(unsigned int nid)
> > +static int get_hardware_info(struct amd64_pvt *pvt,
> > +struct amd64_family_type *fam_type)
> >  {
> > -   struct pci_dev *F3 = node_to_amd_nb(nid)->misc;
> > -   struct amd64_family_type *fam_type = NULL;
> > -   struct mem_ctl_info *mci = NULL;
> > -   struct edac_mc_layer layers[2];
> > -   struct amd64_pvt *pvt = NULL;
> > u16 pci_id1, pci_id2;
> > -   int err = 0, ret;
> > -
> > -   ret = -ENOMEM;
> > -   pvt = kzalloc(sizeof(struct amd64_pvt), GFP_KERNEL);
> > -   if (!pvt)
> > -   goto err_ret;
> > -
> > -   pvt->mc_node_id = nid;
> > -   pvt->F3 = F3;
> > -
> > -   ret = -EINVAL;
> > -   fam_type = per_family_init(pvt);
> > -   if (!fam_type)
> > -   goto err_free;
> > +   int ret = -EINVAL;
> >
> > if (pvt->fam >= 0x17) {
> > pvt->umc = kcalloc(num_umcs, sizeof(struct amd64_umc), 
> > GFP_KERNEL);
> 
> Yeah, a get_hardware_info() function which does an allocation of that
> struct amd64_umc on => F17 which is only 20 bytes. Just add it into the
> pvt struct:
> 
> struct amd64_pvt {
>   ...
>   struct amd64_umc umc;  /* UMC registers */
> };
> 
> and be done with it. This should simplify the code flow here a bit and
> 20 bytes more per pvt is not a big deal.
> 

This struct is used per channel, so we may have 2-8 per system. We could fix it 
at the max (8).
What do you think?

Thanks,
Yazen


RE: [PATCH v3 0/8] AMD64 EDAC fixes

2019-08-26 Thread Ghannam, Yazen
> -Original Message-
> From: linux-edac-ow...@vger.kernel.org  On 
> Behalf Of Borislav Petkov
> Sent: Monday, August 26, 2019 9:59 AM
> To: Ghannam, Yazen 
> Cc: Adam Borowski ; linux-e...@vger.kernel.org; 
> linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v3 0/8] AMD64 EDAC fixes
> 
> On Mon, Aug 26, 2019 at 02:19:18PM +, Ghannam, Yazen wrote:
> > I was tracking down the failure with ECC disabled, and that seems to be it.
> >
> > So I think we should return 0 "if (!edac_has_mcs())", because we'd only get
> > there if ECC is disabled on all nodes and there wasn't some other 
> > initialization
> > error.
> >
> > I'll send a patch for this soon.
> >
> > Adam, would you mind testing this patch?
> 
> You can't return 0 when ECC is disabled on all nodes because then the
> driver remains loaded without driving anything. That silly userspace
> needs to understand that ENODEV means "stop trying to load this driver".
> 

Yes, you're right.

I'll try and track down the interaction here between userspace and the module.
Please let me know if you have any suggestions.

Thanks,
Yazen


RE: [PATCH v3 0/8] AMD64 EDAC fixes

2019-08-26 Thread Ghannam, Yazen
> -Original Message-
> From: linux-edac-ow...@vger.kernel.org  On 
> Behalf Of Borislav Petkov
> Sent: Friday, August 23, 2019 10:38 AM
> To: Ghannam, Yazen 
> Cc: Adam Borowski ; linux-e...@vger.kernel.org; 
> linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v3 0/8] AMD64 EDAC fixes
> 
> On Fri, Aug 23, 2019 at 03:28:59PM +, Ghannam, Yazen wrote:
> > Boris, Do you think it'd be appropriate to change the return values
> > for some cases?
> >
> > For example, ECC disabled is a hardware configuration. This doesn't
> > mean that the module failed any operations in this case.
> >
> > In other words, the module checks for a feature. If the feature is not
> > present, then return without failure (and maybe give a message).
> 
> That makes sense but AFAICT if probe_one_instance() sees that ECC is not
> enabled, it returns 0.
> 
> The "if (!edac_has_mcs())" check later is to verify that at least once
> instance was loaded successfully and, if not, then return an error.
> 
> So where does it return failure?
> 

I was tracking down the failure with ECC disabled, and that seems to be it.

So I think we should return 0 "if (!edac_has_mcs())", because we'd only get
there if ECC is disabled on all nodes and there wasn't some other initialization
error.

I'll send a patch for this soon.

Adam, would you mind testing this patch?

Thanks,
Yazen


RE: [PATCH v3 0/8] AMD64 EDAC fixes

2019-08-23 Thread Ghannam, Yazen
> -Original Message-
> From: linux-edac-ow...@vger.kernel.org  On 
> Behalf Of Ghannam, Yazen
> Sent: Thursday, August 22, 2019 1:54 PM
> To: Adam Borowski 
> Cc: linux-e...@vger.kernel.org; linux-kernel@vger.kernel.org; b...@alien8.de
> Subject: RE: [PATCH v3 0/8] AMD64 EDAC fixes
> 
...
> I wonder if the module is being loaded multiple times. I'll try to reproduce 
> this and track it down.
> 

I was able to reproduce a similar failure. I do see that the module is being 
loaded multiple times on failure.

Here's a call trace from one dump_stack() output:
[  +0.004964] CPU: 132 PID: 2680 Comm: systemd-udevd Not tainted 
4.20.0-edac-debug+ #36
[  +0.009802] Call Trace:
[  +0.002727]  dump_stack+0x63/0x85
[  +0.003696]  amd64_edac_init+0x2163/0x3000 [amd64_edac_mod]
[  +0.006216]  ? __wake_up+0x13/0x20
[  +0.003790]  ? 0xc120d000
[  +0.003694]  do_one_initcall+0x4a/0x1c9
[  +0.004277]  ? _cond_resched+0x19/0x40
[  +0.004178]  ? kmem_cache_alloc_trace+0x15c/0x1d0
[  +0.005244]  do_init_module+0x5f/0x216
[  +0.004180]  load_module+0x21d5/0x2ac0
[  +0.004179]  ? wait_woken+0x80/0x80
[  +0.003889]  __do_sys_finit_module+0xfc/0x120
[  +0.004858]  ? __do_sys_finit_module+0xfc/0x120
[  +0.005052]  __x64_sys_finit_module+0x1a/0x20
[  +0.004857]  do_syscall_64+0x5a/0x120
[  +0.004081]  entry_SYSCALL_64_after_hwframe+0x44/0xa9


So it seems that userspace (systemd-udevd) keeps trying to load the module. I'm 
not sure how to prevent this from within the module.

Boris,
Do you think it'd be appropriate to change the return values for some cases?

For example, ECC disabled is a hardware configuration. This doesn't mean that 
the module failed any operations in this case.

In other words, the module checks for a feature. If the feature is not present, 
then return without failure (and maybe give a message).

Thanks,
Yazen


RE: [PATCH v3 7/8] EDAC/amd64: Support Asymmetric Dual-Rank DIMMs

2019-08-23 Thread Ghannam, Yazen
> -Original Message-
> From: linux-edac-ow...@vger.kernel.org  On 
> Behalf Of Borislav Petkov
> Sent: Friday, August 23, 2019 6:26 AM
> To: Ghannam, Yazen 
> Cc: linux-e...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v3 7/8] EDAC/amd64: Support Asymmetric Dual-Rank DIMMs
> 
> On Thu, Aug 22, 2019 at 12:00:02AM +, Ghannam, Yazen wrote:
> > From: Yazen Ghannam 
> >
> > Future AMD systems will support "Asymmetric" Dual-Rank DIMMs. These are
> > DIMMs where the ranks are of different sizes.
> >
> > The even rank will use the Primary Even Chip Select registers and the
> > odd rank will use the Secondary Odd Chip Select registers.
> >
> > Recognize if a Secondary Odd Chip Select is being used. Use the
> > Secondary Odd Address Mask when calculating the chip select size.
> >
> > Signed-off-by: Yazen Ghannam 
> > ---
> > Link:
> > https://lkml.kernel.org/r/20190709215643.171078-8-yazen.ghan...@amd.com
> >
> > v2->v3:
> > * Add check of csrow_nr before using secondary mask.
> >
> > v1->v2:
> > * No change.
> >
> >  drivers/edac/amd64_edac.c | 18 +++---
> >  1 file changed, 15 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> > index 26ce48fcaf00..4d1e6daa7ec4 100644
> > --- a/drivers/edac/amd64_edac.c
> > +++ b/drivers/edac/amd64_edac.c
> > @@ -790,9 +790,13 @@ static void debug_dump_dramcfg_low(struct amd64_pvt 
> > *pvt, u32 dclr, int chan)
> >
> >  #define CS_EVEN_PRIMARYBIT(0)
> >  #define CS_ODD_PRIMARY BIT(1)
> > +#define CS_EVEN_SECONDARY  BIT(2)
> > +#define CS_ODD_SECONDARY   BIT(3)
> >
> > -#define CS_EVENCS_EVEN_PRIMARY
> > -#define CS_ODD CS_ODD_PRIMARY
> > +#define CS_EVEN(CS_EVEN_PRIMARY | CS_EVEN_SECONDARY)
> > +#define CS_ODD (CS_ODD_PRIMARY | CS_EVEN_SECONDARY)
> 
> That's just my urge to have stuff ballanced but shouldn't that last line be:
> 
> #define CS_ODD(CS_ODD_PRIMARY | CS_ODD_SECONDARY)
> 
> i.e., not have "even" as in CS_EVEN_SECONDARY in there but only "odd"s? :)
> 

Yes, sorry I missed that.

> > +#define csrow_sec_enabled(i, dct, pvt) 
> > ((pvt)->csels[(dct)].csbases_sec[(i)] & DCSB_CS_ENABLE)
> 
> I moved that to the header, under csrow_enabled().
> 

Okay, thank you.

-Yazen


RE: [PATCH v3 0/8] AMD64 EDAC fixes

2019-08-22 Thread Ghannam, Yazen
> -Original Message-
> From: linux-edac-ow...@vger.kernel.org  On 
> Behalf Of Adam Borowski
> Sent: Wednesday, August 21, 2019 7:50 PM
> To: Ghannam, Yazen 
> Cc: linux-e...@vger.kernel.org; linux-kernel@vger.kernel.org; b...@alien8.de
> Subject: Re: [PATCH v3 0/8] AMD64 EDAC fixes
> 
> On Wed, Aug 21, 2019 at 11:59:53PM +, Ghannam, Yazen wrote:
> > I've also added RFC patches to avoid the "ECC disabled" message for
> > nodes without memory. I haven't fully tested these, but I wanted to get
> > your thoughts. Here's an earlier discussion:
> > https://lkml.kernel.org/r/20180321191335.7832-1-yazen.ghan...@amd.com
> 
> While you're editing that code, could you please also cut the spam if ECC is
> actually disabled?  For example, a 2990WX with non-ECC RAM gets 1024 lines;
> 64 copies of:
> 
> [8.186164] EDAC amd64: Node 0: DRAM ECC disabled.
> [8.188364] EDAC amd64: ECC disabled in the BIOS or no ECC capability, 
> module will not load.
> Either enable ECC checking or force module loading by setting 
> 'ecc_enable_override'.
> (Note that use of the override may cause unknown side 
> effects.)
> [8.194762] EDAC amd64: Node 1: DRAM ECC disabled.
> [8.196307] EDAC amd64: ECC disabled in the BIOS or no ECC capability, 
> module will not load.
> Either enable ECC checking or force module loading by setting 
> 'ecc_enable_override'.
> (Note that use of the override may cause unknown side 
> effects.)
> [8.199840] EDAC amd64: Node 2: DRAM ECC disabled.
> [8.200963] EDAC amd64: ECC disabled in the BIOS or no ECC capability, 
> module will not load.
> Either enable ECC checking or force module loading by setting 
> 'ecc_enable_override'.
> (Note that use of the override may cause unknown side 
> effects.)
> [8.204326] EDAC amd64: Node 3: DRAM ECC disabled.
> [8.205436] EDAC amd64: ECC disabled in the BIOS or no ECC capability, 
> module will not load.
> Either enable ECC checking or force module loading by setting 
> 'ecc_enable_override'.
> (Note that use of the override may cause unknown side 
> effects.)
> 

I wonder if the module is being loaded multiple times. I'll try to reproduce 
this and track it down.

Thanks for testing and reporting this issue.

-Yazen


[RFC PATCH v2] EDAC/amd64: Check for memory before fully initializing an instance

2019-08-22 Thread Ghannam, Yazen
From: Yazen Ghannam 

Return early before checking for ECC if the node does not have any
populated memory.

Free any cached hardware data before returning. Also, return 0 in this
case since this is not a failure. Other nodes may have memory and the
module should attempt to load an instance for them.

Move printing of hardware information to after the instance is
initialized, so that the information is only printed for nodes with
memory.

Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190821235938.118710-11-yazen.ghan...@amd.com

v1->v2:
* Moved hardware info printing to after instance is initialized.
* Added message for when instance has no memory.

 drivers/edac/amd64_edac.c | 27 +--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index c1cb0234f085..3f0fe6ed1fa3 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2831,8 +2831,6 @@ static void read_mc_regs(struct amd64_pvt *pvt)
edac_dbg(1, "  DIMM type: %s\n", edac_mem_types[pvt->dram_type]);
 
determine_ecc_sym_sz(pvt);
-
-   dump_misc_regs(pvt);
 }
 
 /*
@@ -3505,6 +3503,23 @@ static int init_one_instance(struct amd64_pvt *pvt,
return ret;
 }
 
+static bool instance_has_memory(struct amd64_pvt *pvt)
+{
+   bool cs_enabled = false;
+   int num_channels = 2;
+   int cs = 0, dct = 0;
+
+   if (pvt->umc)
+   num_channels = num_umcs;
+
+   for (dct = 0; dct < num_channels; dct++) {
+   for_each_chip_select(cs, dct, pvt)
+   cs_enabled |= csrow_enabled(cs, dct, pvt);
+   }
+
+   return cs_enabled;
+}
+
 static int probe_one_instance(unsigned int nid)
 {
struct pci_dev *F3 = node_to_amd_nb(nid)->misc;
@@ -3535,6 +3550,12 @@ static int probe_one_instance(unsigned int nid)
if (ret < 0)
goto err_enable;
 
+   ret = 0;
+   if (!instance_has_memory(pvt)) {
+   amd64_warn("Node %d: DRAM ECC disabled. No DIMMs detected.\n", 
nid);
+   goto err_enable;
+   }
+
if (!ecc_enabled(pvt)) {
ret = 0;
 
@@ -3561,6 +3582,8 @@ static int probe_one_instance(unsigned int nid)
goto err_enable;
}
 
+   dump_misc_regs(pvt);
+
return ret;
 
 err_enable:
-- 
2.17.1



[RFC PATCH v2] EDAC/amd64: Check for memory before fully initializing an instance

2019-08-22 Thread Ghannam, Yazen
From: Yazen Ghannam 

Return early before checking for ECC if the node does not have any
populated memory.

Free any cached hardware data before returning. Also, return 0 in this
case since this is not a failure. Other nodes may have memory and the
module should attempt to load an instance for them.

Move printing of hardware information to after the instance is
initialized, so that the information is only printed for nodes with
memory.

Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190821235938.118710-11-yazen.ghan...@amd.com

v1->v2:
* Moved hardware info printing to after instance is initialized.
* Added message for when instance has no memory.

 drivers/edac/amd64_edac.c | 27 +--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index c1cb0234f085..3f0fe6ed1fa3 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2831,8 +2831,6 @@ static void read_mc_regs(struct amd64_pvt *pvt)
edac_dbg(1, "  DIMM type: %s\n", edac_mem_types[pvt->dram_type]);
 
determine_ecc_sym_sz(pvt);
-
-   dump_misc_regs(pvt);
 }
 
 /*
@@ -3505,6 +3503,23 @@ static int init_one_instance(struct amd64_pvt *pvt,
return ret;
 }
 
+static bool instance_has_memory(struct amd64_pvt *pvt)
+{
+   bool cs_enabled = false;
+   int num_channels = 2;
+   int cs = 0, dct = 0;
+
+   if (pvt->umc)
+   num_channels = num_umcs;
+
+   for (dct = 0; dct < num_channels; dct++) {
+   for_each_chip_select(cs, dct, pvt)
+   cs_enabled |= csrow_enabled(cs, dct, pvt);
+   }
+
+   return cs_enabled;
+}
+
 static int probe_one_instance(unsigned int nid)
 {
struct pci_dev *F3 = node_to_amd_nb(nid)->misc;
@@ -3535,6 +3550,12 @@ static int probe_one_instance(unsigned int nid)
if (ret < 0)
goto err_enable;
 
+   ret = 0;
+   if (!instance_has_memory(pvt)) {
+   amd64_warn("Node %d: DRAM ECC disabled. No DIMMs detected.\n", 
nid);
+   goto err_enable;
+   }
+
if (!ecc_enabled(pvt)) {
ret = 0;
 
@@ -3561,6 +3582,8 @@ static int probe_one_instance(unsigned int nid)
goto err_enable;
}
 
+   dump_misc_regs(pvt);
+
return ret;
 
 err_enable:
-- 
2.17.1



[PATCH v3 7/8] EDAC/amd64: Support Asymmetric Dual-Rank DIMMs

2019-08-21 Thread Ghannam, Yazen
From: Yazen Ghannam 

Future AMD systems will support "Asymmetric" Dual-Rank DIMMs. These are
DIMMs where the ranks are of different sizes.

The even rank will use the Primary Even Chip Select registers and the
odd rank will use the Secondary Odd Chip Select registers.

Recognize if a Secondary Odd Chip Select is being used. Use the
Secondary Odd Address Mask when calculating the chip select size.

Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190709215643.171078-8-yazen.ghan...@amd.com

v2->v3:
* Add check of csrow_nr before using secondary mask.

v1->v2:
* No change.

 drivers/edac/amd64_edac.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 26ce48fcaf00..4d1e6daa7ec4 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -790,9 +790,13 @@ static void debug_dump_dramcfg_low(struct amd64_pvt *pvt, 
u32 dclr, int chan)
 
 #define CS_EVEN_PRIMARYBIT(0)
 #define CS_ODD_PRIMARY BIT(1)
+#define CS_EVEN_SECONDARY  BIT(2)
+#define CS_ODD_SECONDARY   BIT(3)
 
-#define CS_EVENCS_EVEN_PRIMARY
-#define CS_ODD CS_ODD_PRIMARY
+#define CS_EVEN(CS_EVEN_PRIMARY | CS_EVEN_SECONDARY)
+#define CS_ODD (CS_ODD_PRIMARY | CS_EVEN_SECONDARY)
+
+#define csrow_sec_enabled(i, dct, pvt) ((pvt)->csels[(dct)].csbases_sec[(i)] & 
DCSB_CS_ENABLE)
 
 static int f17_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt)
 {
@@ -804,6 +808,10 @@ static int f17_get_cs_mode(int dimm, u8 ctrl, struct 
amd64_pvt *pvt)
if (csrow_enabled(2 * dimm + 1, ctrl, pvt))
cs_mode |= CS_ODD_PRIMARY;
 
+   /* Asymmetric Dual-Rank DIMM support. */
+   if (csrow_sec_enabled(2 * dimm + 1, ctrl, pvt))
+   cs_mode |= CS_ODD_SECONDARY;
+
return cs_mode;
 }
 
@@ -1600,7 +1608,11 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt 
*pvt, u8 umc,
 */
dimm = csrow_nr >> 1;
 
-   addr_mask_orig = pvt->csels[umc].csmasks[dimm];
+   /* Asymmetric Dual-Rank DIMM support. */
+   if ((csrow_nr & 1) && (cs_mode & CS_ODD_SECONDARY))
+   addr_mask_orig = pvt->csels[umc].csmasks_sec[dimm];
+   else
+   addr_mask_orig = pvt->csels[umc].csmasks[dimm];
 
/*
 * The number of zero bits in the mask is equal to the number of bits
-- 
2.17.1



[PATCH v3 2/8] EDAC/amd64: Recognize DRAM device type with EDAC_CTL_CAP

2019-08-21 Thread Ghannam, Yazen
From: Yazen Ghannam 

AMD Family 17h systems support x4 and x16 DRAM devices. However, the
device type is not checked when setting EDAC_CTL_CAP.

Set the appropriate EDAC_CTL_CAP flag based on the device type.

Default to x8 DRAM device when neither the x4 or x16 bits are set.

Fixes: 2d09d8f301f5 ("EDAC, amd64: Determine EDAC MC capabilities on Fam17h")
Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190709215643.171078-3-yazen.ghan...@amd.com

v2->v3:
* Add case for x8 DRAM devices.

v1->v2:
* No change.

 drivers/edac/amd64_edac.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index dd60cf5a3d96..0e8b2137edbb 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3150,12 +3150,15 @@ static bool ecc_enabled(struct pci_dev *F3, u16 nid)
 static inline void
 f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt)
 {
-   u8 i, ecc_en = 1, cpk_en = 1;
+   u8 i, ecc_en = 1, cpk_en = 1, dev_x4 = 1, dev_x16 = 1;
 
for_each_umc(i) {
if (pvt->umc[i].sdp_ctrl & UMC_SDP_INIT) {
ecc_en &= !!(pvt->umc[i].umc_cap_hi & UMC_ECC_ENABLED);
cpk_en &= !!(pvt->umc[i].umc_cap_hi & 
UMC_ECC_CHIPKILL_CAP);
+
+   dev_x4 &= !!(pvt->umc[i].dimm_cfg & BIT(6));
+   dev_x16 &= !!(pvt->umc[i].dimm_cfg & BIT(7));
}
}
 
@@ -3163,8 +3166,14 @@ f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, 
struct amd64_pvt *pvt)
if (ecc_en) {
mci->edac_ctl_cap |= EDAC_FLAG_SECDED;
 
-   if (cpk_en)
-   mci->edac_ctl_cap |= EDAC_FLAG_S4ECD4ED;
+   if (cpk_en) {
+   if (dev_x4)
+   mci->edac_ctl_cap |= EDAC_FLAG_S4ECD4ED;
+   else if (dev_x16)
+   mci->edac_ctl_cap |= EDAC_FLAG_S16ECD16ED;
+   else
+   mci->edac_ctl_cap |= EDAC_FLAG_S8ECD8ED;
+   }
}
 }
 
-- 
2.17.1



[PATCH v3 3/8] EDAC/amd64: Initialize DIMM info for systems with more than two channels

2019-08-21 Thread Ghannam, Yazen
From: Yazen Ghannam 

Currently, the DIMM info for AMD Family 17h systems is initialized in
init_csrows(). This function is shared with legacy systems, and it has a
limit of two channel support.

This prevents initialization of the DIMM info for a number of ranks, so
there will be missing ranks in the EDAC sysfs.

Create a new init_csrows_df() for Family17h+ and revert init_csrows()
back to pre-Family17h support.

Loop over all channels in the new function in order to support systems
with more than two channels.

Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190709215643.171078-4-yazen.ghan...@amd.com

v2->v3:
* Drop Fixes: tag.
* Add x8 DRAM device case.

v1->v2:
* No change.

 drivers/edac/amd64_edac.c | 66 ++-
 1 file changed, 52 insertions(+), 14 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 0e8b2137edbb..001dc85122e9 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2837,6 +2837,49 @@ static u32 get_csrow_nr_pages(struct amd64_pvt *pvt, u8 
dct, int csrow_nr_orig)
return nr_pages;
 }
 
+static int init_csrows_df(struct mem_ctl_info *mci)
+{
+   struct amd64_pvt *pvt = mci->pvt_info;
+   enum edac_type edac_mode = EDAC_NONE;
+   enum dev_type dev_type = DEV_UNKNOWN;
+   struct dimm_info *dimm;
+   int empty = 1;
+   u8 umc, cs;
+
+   if (mci->edac_ctl_cap & EDAC_FLAG_S16ECD16ED) {
+   edac_mode = EDAC_S16ECD16ED;
+   dev_type = DEV_X16;
+   } else if (mci->edac_ctl_cap & EDAC_FLAG_S8ECD8ED) {
+   edac_mode = EDAC_S8ECD8ED;
+   dev_type = DEV_X8;
+   } else if (mci->edac_ctl_cap & EDAC_FLAG_S4ECD4ED) {
+   edac_mode = EDAC_S4ECD4ED;
+   dev_type = DEV_X4;
+   } else if (mci->edac_ctl_cap & EDAC_FLAG_SECDED) {
+   edac_mode = EDAC_SECDED;
+   }
+
+   for_each_umc(umc) {
+   for_each_chip_select(cs, umc, pvt) {
+   if (!csrow_enabled(cs, umc, pvt))
+   continue;
+
+   empty = 0;
+   dimm = mci->csrows[cs]->channels[umc]->dimm;
+
+   edac_dbg(1, "MC node: %d, csrow: %d\n",
+   pvt->mc_node_id, cs);
+
+   dimm->nr_pages = get_csrow_nr_pages(pvt, umc, cs);
+   dimm->mtype = pvt->dram_type;
+   dimm->edac_mode = edac_mode;
+   dimm->dtype = dev_type;
+   }
+   }
+
+   return empty;
+}
+
 /*
  * Initialize the array of csrow attribute instances, based on the values
  * from pci config hardware registers.
@@ -2851,15 +2894,16 @@ static int init_csrows(struct mem_ctl_info *mci)
int nr_pages = 0;
u32 val;
 
-   if (!pvt->umc) {
-   amd64_read_pci_cfg(pvt->F3, NBCFG, );
+   if (pvt->umc)
+   return init_csrows_df(mci);
+
+   amd64_read_pci_cfg(pvt->F3, NBCFG, );
 
-   pvt->nbcfg = val;
+   pvt->nbcfg = val;
 
-   edac_dbg(0, "node %d, NBCFG=0x%08x[ChipKillEccCap: 
%d|DramEccEn: %d]\n",
-pvt->mc_node_id, val,
-!!(val & NBCFG_CHIPKILL), !!(val & NBCFG_ECC_ENABLE));
-   }
+   edac_dbg(0, "node %d, NBCFG=0x%08x[ChipKillEccCap: %d|DramEccEn: %d]\n",
+pvt->mc_node_id, val,
+!!(val & NBCFG_CHIPKILL), !!(val & NBCFG_ECC_ENABLE));
 
/*
 * We iterate over DCT0 here but we look at DCT1 in parallel, if needed.
@@ -2896,13 +2940,7 @@ static int init_csrows(struct mem_ctl_info *mci)
edac_dbg(1, "Total csrow%d pages: %u\n", i, nr_pages);
 
/* Determine DIMM ECC mode: */
-   if (pvt->umc) {
-   if (mci->edac_ctl_cap & EDAC_FLAG_S4ECD4ED)
-   edac_mode = EDAC_S4ECD4ED;
-   else if (mci->edac_ctl_cap & EDAC_FLAG_SECDED)
-   edac_mode = EDAC_SECDED;
-
-   } else if (pvt->nbcfg & NBCFG_ECC_ENABLE) {
+   if (pvt->nbcfg & NBCFG_ECC_ENABLE) {
edac_mode = (pvt->nbcfg & NBCFG_CHIPKILL)
? EDAC_S4ECD4ED
: EDAC_SECDED;
-- 
2.17.1



[RFC PATCH v3 09/10] EDAC/amd64: Use cached data when checking for ECC

2019-08-21 Thread Ghannam, Yazen
From: Yazen Ghannam 

...now that the data is available earlier.

Signed-off-by: Yazen Ghannam 
---
 drivers/edac/amd64_edac.c | 20 
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 84832771dec0..c1cb0234f085 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3183,31 +3183,27 @@ static const char *ecc_msg =
"'ecc_enable_override'.\n"
" (Note that use of the override may cause unknown side effects.)\n";
 
-static bool ecc_enabled(struct pci_dev *F3, u16 nid)
+static bool ecc_enabled(struct amd64_pvt *pvt)
 {
+   u16 nid = pvt->mc_node_id;
bool nb_mce_en = false;
u8 ecc_en = 0, i;
u32 value;
 
if (boot_cpu_data.x86 >= 0x17) {
u8 umc_en_mask = 0, ecc_en_mask = 0;
+   struct amd64_umc *umc;
 
for_each_umc(i) {
-   u32 base = get_umc_base(i);
+   umc = >umc[i];
 
/* Only check enabled UMCs. */
-   if (amd_smn_read(nid, base + UMCCH_SDP_CTRL, ))
-   continue;
-
-   if (!(value & UMC_SDP_INIT))
+   if (!(umc->sdp_ctrl & UMC_SDP_INIT))
continue;
 
umc_en_mask |= BIT(i);
 
-   if (amd_smn_read(nid, base + UMCCH_UMC_CAP_HI, ))
-   continue;
-
-   if (value & UMC_ECC_ENABLED)
+   if (umc->umc_cap_hi & UMC_ECC_ENABLED)
ecc_en_mask |= BIT(i);
}
 
@@ -3220,7 +3216,7 @@ static bool ecc_enabled(struct pci_dev *F3, u16 nid)
/* Assume UMC MCA banks are enabled. */
nb_mce_en = true;
} else {
-   amd64_read_pci_cfg(F3, NBCFG, );
+   amd64_read_pci_cfg(pvt->F3, NBCFG, );
 
ecc_en = !!(value & NBCFG_ECC_ENABLE);
 
@@ -3539,7 +3535,7 @@ static int probe_one_instance(unsigned int nid)
if (ret < 0)
goto err_enable;
 
-   if (!ecc_enabled(F3, nid)) {
+   if (!ecc_enabled(pvt)) {
ret = 0;
 
if (!ecc_enable_override)
-- 
2.17.1



[RFC PATCH v3 08/10] EDAC/amd64: Gather hardware information early

2019-08-21 Thread Ghannam, Yazen
From: Yazen Ghannam 

Split out gathering hardware information from init_one_instance() into a
separate function get_hardware_info().

This is necessary so that the information can be cached earlier and used
to check if memory is populated and if ECC is enabled on a node.

Signed-off-by: Yazen Ghannam 
---
 drivers/edac/amd64_edac.c | 76 +++
 1 file changed, 45 insertions(+), 31 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 4d1e6daa7ec4..84832771dec0 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3405,34 +3405,17 @@ static void compute_num_umcs(void)
edac_dbg(1, "Number of UMCs: %x", num_umcs);
 }
 
-static int init_one_instance(unsigned int nid)
+static int get_hardware_info(struct amd64_pvt *pvt,
+struct amd64_family_type *fam_type)
 {
-   struct pci_dev *F3 = node_to_amd_nb(nid)->misc;
-   struct amd64_family_type *fam_type = NULL;
-   struct mem_ctl_info *mci = NULL;
-   struct edac_mc_layer layers[2];
-   struct amd64_pvt *pvt = NULL;
u16 pci_id1, pci_id2;
-   int err = 0, ret;
-
-   ret = -ENOMEM;
-   pvt = kzalloc(sizeof(struct amd64_pvt), GFP_KERNEL);
-   if (!pvt)
-   goto err_ret;
-
-   pvt->mc_node_id = nid;
-   pvt->F3 = F3;
-
-   ret = -EINVAL;
-   fam_type = per_family_init(pvt);
-   if (!fam_type)
-   goto err_free;
+   int ret = -EINVAL;
 
if (pvt->fam >= 0x17) {
pvt->umc = kcalloc(num_umcs, sizeof(struct amd64_umc), 
GFP_KERNEL);
if (!pvt->umc) {
ret = -ENOMEM;
-   goto err_free;
+   goto err_ret;
}
 
pci_id1 = fam_type->f0_id;
@@ -3442,18 +3425,34 @@ static int init_one_instance(unsigned int nid)
pci_id2 = fam_type->f2_id;
}
 
-   err = reserve_mc_sibling_devs(pvt, pci_id1, pci_id2);
-   if (err)
+   ret = reserve_mc_sibling_devs(pvt, pci_id1, pci_id2);
+   if (ret)
goto err_post_init;
 
read_mc_regs(pvt);
 
+   return 0;
+
+err_post_init:
+   if (pvt->fam >= 0x17)
+   kfree(pvt->umc);
+
+err_ret:
+   return ret;
+}
+
+static int init_one_instance(struct amd64_pvt *pvt,
+struct amd64_family_type *fam_type)
+{
+   struct mem_ctl_info *mci = NULL;
+   struct edac_mc_layer layers[2];
+   int ret = -EINVAL;
+
/*
 * We need to determine how many memory channels there are. Then use
 * that information for calculating the size of the dynamic instance
 * tables in the 'mci' structure.
 */
-   ret = -EINVAL;
pvt->channel_count = pvt->ops->early_channel_count(pvt);
if (pvt->channel_count < 0)
goto err_siblings;
@@ -3478,7 +3477,7 @@ static int init_one_instance(unsigned int nid)
layers[1].size = 2;
layers[1].is_virt_csrow = false;
 
-   mci = edac_mc_alloc(nid, ARRAY_SIZE(layers), layers, 0);
+   mci = edac_mc_alloc(pvt->mc_node_id, ARRAY_SIZE(layers), layers, 0);
if (!mci)
goto err_siblings;
 
@@ -3504,20 +3503,17 @@ static int init_one_instance(unsigned int nid)
 err_siblings:
free_mc_sibling_devs(pvt);
 
-err_post_init:
if (pvt->fam >= 0x17)
kfree(pvt->umc);
 
-err_free:
-   kfree(pvt);
-
-err_ret:
return ret;
 }
 
 static int probe_one_instance(unsigned int nid)
 {
struct pci_dev *F3 = node_to_amd_nb(nid)->misc;
+   struct amd64_family_type *fam_type = NULL;
+   struct amd64_pvt *pvt = NULL;
struct ecc_settings *s;
int ret;
 
@@ -3528,6 +3524,21 @@ static int probe_one_instance(unsigned int nid)
 
ecc_stngs[nid] = s;
 
+   pvt = kzalloc(sizeof(struct amd64_pvt), GFP_KERNEL);
+   if (!pvt)
+   goto err_settings;
+
+   pvt->mc_node_id = nid;
+   pvt->F3 = F3;
+
+   fam_type = per_family_init(pvt);
+   if (!fam_type)
+   goto err_enable;
+
+   ret = get_hardware_info(pvt, fam_type);
+   if (ret < 0)
+   goto err_enable;
+
if (!ecc_enabled(F3, nid)) {
ret = 0;
 
@@ -3544,7 +3555,7 @@ static int probe_one_instance(unsigned int nid)
goto err_enable;
}
 
-   ret = init_one_instance(nid);
+   ret = init_one_instance(pvt, fam_type);
if (ret < 0) {
amd64_err("Error probing instance: %d\n", nid);
 
@@ -3557,6 +3568,9 @@ static int probe_one_instance(unsigned int nid)
return ret;
 
 err_enable:
+   kfree(pvt);
+
+err_settings:
kfree(s);
ecc_stngs[nid] = NULL;
 
-- 
2.17.1



[RFC PATCH v3 10/10] EDAC/amd64: Check for memory before fully initializing an instance

2019-08-21 Thread Ghannam, Yazen
From: Yazen Ghannam 

Return early before checking for ECC if the node does not have any
populated memory.

Free any cached hardware data before returning. Also, return 0 in this
case since this is not a failure. Other nodes may have memory and the
module should attempt to load an instance for them.

Signed-off-by: Yazen Ghannam 
---
 drivers/edac/amd64_edac.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index c1cb0234f085..7230ed4ff665 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3505,6 +3505,23 @@ static int init_one_instance(struct amd64_pvt *pvt,
return ret;
 }
 
+static bool instance_has_memory(struct amd64_pvt *pvt)
+{
+   bool cs_enabled = false;
+   int num_channels = 2;
+   int cs = 0, dct = 0;
+
+   if (pvt->umc)
+   num_channels = num_umcs;
+
+   for (dct = 0; dct < num_channels; dct++) {
+   for_each_chip_select(cs, dct, pvt)
+   cs_enabled |= csrow_enabled(cs, dct, pvt);
+   }
+
+   return cs_enabled;
+}
+
 static int probe_one_instance(unsigned int nid)
 {
struct pci_dev *F3 = node_to_amd_nb(nid)->misc;
@@ -3535,6 +3552,10 @@ static int probe_one_instance(unsigned int nid)
if (ret < 0)
goto err_enable;
 
+   ret = 0;
+   if (!instance_has_memory(pvt))
+   goto err_enable;
+
if (!ecc_enabled(pvt)) {
ret = 0;
 
-- 
2.17.1



[PATCH v3 5/8] EDAC/amd64: Decode syndrome before translating address

2019-08-21 Thread Ghannam, Yazen
From: Yazen Ghannam 

AMD Family 17h systems currently require address translation in order to
report the system address of a DRAM ECC error. This is currently done
before decoding the syndrome information. The syndrome information does
not depend on the address translation, so the proper EDAC csrow/channel
reporting can function without the address. However, the syndrome
information will not be decoded if the address translation fails.

Decode the syndrome information before doing the address translation.
The syndrome information is architecturally defined in MCA_SYND and can
be considered robust. The address translation is system-specific and may
fail on newer systems without proper updates to the translation
algorithm.

Fixes: 713ad54675fd ("EDAC, amd64: Define and register UMC error decode 
function")
Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190709215643.171078-6-yazen.ghan...@amd.com

v2->v3:
* No change.

v1->v2:
* No change.

 drivers/edac/amd64_edac.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index c4f2d7c59b04..de141de7b2e5 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2574,13 +2574,6 @@ static void decode_umc_error(int node_id, struct mce *m)
 
err.channel = find_umc_channel(m);
 
-   if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, err.channel, 
_addr)) {
-   err.err_code = ERR_NORM_ADDR;
-   goto log_error;
-   }
-
-   error_address_to_page_and_offset(sys_addr, );
-
if (!(m->status & MCI_STATUS_SYNDV)) {
err.err_code = ERR_SYND;
goto log_error;
@@ -2597,6 +2590,13 @@ static void decode_umc_error(int node_id, struct mce *m)
 
err.csrow = m->synd & 0x7;
 
+   if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, err.channel, 
_addr)) {
+   err.err_code = ERR_NORM_ADDR;
+   goto log_error;
+   }
+
+   error_address_to_page_and_offset(sys_addr, );
+
 log_error:
__log_ecc_error(mci, , ecc_type);
 }
-- 
2.17.1



[PATCH v3 4/8] EDAC/amd64: Find Chip Select memory size using Address Mask

2019-08-21 Thread Ghannam, Yazen
From: Yazen Ghannam 

Chip Select memory size reporting on AMD Family 17h was recently fixed
in order to account for interleaving. However, the current method is not
robust.

The Chip Select Address Mask can be used to find the memory size. There
are a couple of cases.

1) For single-rank and dual-rank non-interleaved, use the address mask
plus 1 as the size.

2) For dual-rank interleaved, do #1 but "de-interleave" the address mask
first.

Always "de-interleave" the address mask in order to simplify the code
flow. Bit mask manipulation is necessary to check for interleaving, so
just go ahead and do the de-interleaving. In the non-interleaved case,
the original and de-interleaved address masks will be the same.

To de-interleave the mask, count the number of zero bits in the middle
of the mask and swap them with the most significant bits.

For example,
Original=0x9FE, De-interleaved=0x3FE

Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190709215643.171078-5-yazen.ghan...@amd.com

v2->v3:
* Drop Fixes: tag.
* Add checks to only return CS size for enabled CSes.

v1->v2:
* No change.

 drivers/edac/amd64_edac.c | 114 +++---
 1 file changed, 70 insertions(+), 44 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 001dc85122e9..c4f2d7c59b04 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -788,51 +788,39 @@ static void debug_dump_dramcfg_low(struct amd64_pvt *pvt, 
u32 dclr, int chan)
 (dclr & BIT(15)) ?  "yes" : "no");
 }
 
-/*
- * The Address Mask should be a contiguous set of bits in the non-interleaved
- * case. So to check for CS interleaving, find the most- and least-significant
- * bits of the mask, generate a contiguous bitmask, and compare the two.
- */
-static bool f17_cs_interleaved(struct amd64_pvt *pvt, u8 ctrl, int cs)
+#define CS_EVEN_PRIMARYBIT(0)
+#define CS_ODD_PRIMARY BIT(1)
+
+#define CS_EVENCS_EVEN_PRIMARY
+#define CS_ODD CS_ODD_PRIMARY
+
+static int f17_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt)
 {
-   u32 mask = pvt->csels[ctrl].csmasks[cs >> 1];
-   u32 msb = fls(mask) - 1, lsb = ffs(mask) - 1;
-   u32 test_mask = GENMASK(msb, lsb);
+   int cs_mode = 0;
 
-   edac_dbg(1, "mask=0x%08x test_mask=0x%08x\n", mask, test_mask);
+   if (csrow_enabled(2 * dimm, ctrl, pvt))
+   cs_mode |= CS_EVEN_PRIMARY;
 
-   return mask ^ test_mask;
+   if (csrow_enabled(2 * dimm + 1, ctrl, pvt))
+   cs_mode |= CS_ODD_PRIMARY;
+
+   return cs_mode;
 }
 
 static void debug_display_dimm_sizes_df(struct amd64_pvt *pvt, u8 ctrl)
 {
-   int dimm, size0, size1, cs0, cs1;
+   int dimm, size0, size1, cs0, cs1, cs_mode;
 
edac_printk(KERN_DEBUG, EDAC_MC, "UMC%d chip selects:\n", ctrl);
 
for (dimm = 0; dimm < 2; dimm++) {
-   size0 = 0;
cs0 = dimm * 2;
-
-   if (csrow_enabled(cs0, ctrl, pvt))
-   size0 = pvt->ops->dbam_to_cs(pvt, ctrl, 0, cs0);
-
-   size1 = 0;
cs1 = dimm * 2 + 1;
 
-   if (csrow_enabled(cs1, ctrl, pvt)) {
-   /*
-* CS interleaving is only supported if both CSes have
-* the same amount of memory. Because they are
-* interleaved, it will look like both CSes have the
-* full amount of memory. Save the size for both as
-* half the amount we found on CS0, if interleaved.
-*/
-   if (f17_cs_interleaved(pvt, ctrl, cs1))
-   size1 = size0 = (size0 >> 1);
-   else
-   size1 = pvt->ops->dbam_to_cs(pvt, ctrl, 0, cs1);
-   }
+   cs_mode = f17_get_cs_mode(dimm, ctrl, pvt);
+
+   size0 = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs0);
+   size1 = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs1);
 
amd64_info(EDAC_MC ": %d: %5dMB %d: %5dMB\n",
cs0,size0,
@@ -1569,18 +1557,54 @@ static int f16_dbam_to_chip_select(struct amd64_pvt 
*pvt, u8 dct,
return ddr3_cs_size(cs_mode, false);
 }
 
-static int f17_base_addr_to_cs_size(struct amd64_pvt *pvt, u8 umc,
+static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
unsigned int cs_mode, int csrow_nr)
 {
-   u32 base_addr = pvt->csels[umc].csbases[csrow_nr];
+   u32 addr_mask_orig, addr_mask_deinterleaved;
+   u32 msb, weight, num_zero_bits;
+   int dimm, size = 0;
 
-   /*  Each mask is used for every two base addresses. */
-   u32 addr_mask = pvt->csels[umc].csmasks[csrow_nr >> 1];
+   /* No Chip Selects are enabled. */
+   if (!cs_mode)
+

[PATCH v3 0/8] AMD64 EDAC fixes

2019-08-21 Thread Ghannam, Yazen
From: Yazen Ghannam 

Hi Boris,

This set contains a few fixes for some changes merged in v5.2. There
are also a couple of fixes for older issues. In addition, there are a
couple of patches to add support for Asymmetric Dual-Rank DIMMs.

I don't have the failing config readily available that you used, but I
believe I found the issue. Please let me know how it goes.

I've also added RFC patches to avoid the "ECC disabled" message for
nodes without memory. I haven't fully tested these, but I wanted to get
your thoughts. Here's an earlier discussion:
https://lkml.kernel.org/r/20180321191335.7832-1-yazen.ghan...@amd.com

Thanks,
Yazen

Link:
https://lkml.kernel.org/r/20190709215643.171078-1-yazen.ghan...@amd.com

v2->v3:
* Drop Fixes: tags in patch 1.
* Add detection of x8 DRAM devices in patches 2 and 3.
* Fix Chip Select size printing in patch 4.
* Added RFC patches to avoid "ECC disabled" message for nodes without memory.

v1->v2:
* Squash patches 1 and 2 together.


Yazen Ghannam (10):
  EDAC/amd64: Support more than two controllers for chip selects
handling
  EDAC/amd64: Recognize DRAM device type with EDAC_CTL_CAP
  EDAC/amd64: Initialize DIMM info for systems with more than two
channels
  EDAC/amd64: Find Chip Select memory size using Address Mask
  EDAC/amd64: Decode syndrome before translating address
  EDAC/amd64: Cache secondary Chip Select registers
  EDAC/amd64: Support Asymmetric Dual-Rank DIMMs
  EDAC/amd64: Gather hardware information early
  EDAC/amd64: Use cached data when checking for ECC
  EDAC/amd64: Check for memory before fully initializing an instance

 drivers/edac/amd64_edac.c | 371 +-
 drivers/edac/amd64_edac.h |   9 +-
 2 files changed, 251 insertions(+), 129 deletions(-)

-- 
2.17.1



[PATCH v3 1/8] EDAC/amd64: Support more than two controllers for chip selects handling

2019-08-21 Thread Ghannam, Yazen
From: Yazen Ghannam 

The struct chip_select array that's used for saving chip select bases
and masks is fixed at length of two. There should be one struct
chip_select for each controller, so this array should be increased to
support systems that may have more than two controllers.

Increase the size of the struct chip_select array to eight, which is the
largest number of controllers per die currently supported on AMD
systems.

Fix number of DIMMs and Chip Select bases/masks on Family17h, because AMD
Family 17h systems support 2 DIMMs, 4 CS bases, and 2 CS masks per
channel.

Also, carve out the Family 17h+ reading of the bases/masks into a
separate function. This effectively reverts the original bases/masks
reading code to before Family 17h support was added.

This is a second version of a commit that was reverted.

Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190709215643.171078-2-yazen.ghan...@amd.com

v2->v3:
* Drop Fixes: tags.

v1->v2:
* Patches 1 and 2 squashed together.

 drivers/edac/amd64_edac.c | 123 +-
 drivers/edac/amd64_edac.h |   5 +-
 2 files changed, 71 insertions(+), 57 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 873437be86d9..dd60cf5a3d96 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -810,7 +810,7 @@ static void debug_display_dimm_sizes_df(struct amd64_pvt 
*pvt, u8 ctrl)
 
edac_printk(KERN_DEBUG, EDAC_MC, "UMC%d chip selects:\n", ctrl);
 
-   for (dimm = 0; dimm < 4; dimm++) {
+   for (dimm = 0; dimm < 2; dimm++) {
size0 = 0;
cs0 = dimm * 2;
 
@@ -942,89 +942,102 @@ static void prep_chip_selects(struct amd64_pvt *pvt)
} else if (pvt->fam == 0x15 && pvt->model == 0x30) {
pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4;
pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2;
+   } else if (pvt->fam >= 0x17) {
+   int umc;
+
+   for_each_umc(umc) {
+   pvt->csels[umc].b_cnt = 4;
+   pvt->csels[umc].m_cnt = 2;
+   }
+
} else {
pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8;
pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 4;
}
 }
 
+static void read_umc_base_mask(struct amd64_pvt *pvt)
+{
+   u32 umc_base_reg, umc_mask_reg;
+   u32 base_reg, mask_reg;
+   u32 *base, *mask;
+   int cs, umc;
+
+   for_each_umc(umc) {
+   umc_base_reg = get_umc_base(umc) + UMCCH_BASE_ADDR;
+
+   for_each_chip_select(cs, umc, pvt) {
+   base = >csels[umc].csbases[cs];
+
+   base_reg = umc_base_reg + (cs * 4);
+
+   if (!amd_smn_read(pvt->mc_node_id, base_reg, base))
+   edac_dbg(0, "  DCSB%d[%d]=0x%08x reg: 0x%x\n",
+umc, cs, *base, base_reg);
+   }
+
+   umc_mask_reg = get_umc_base(umc) + UMCCH_ADDR_MASK;
+
+   for_each_chip_select_mask(cs, umc, pvt) {
+   mask = >csels[umc].csmasks[cs];
+
+   mask_reg = umc_mask_reg + (cs * 4);
+
+   if (!amd_smn_read(pvt->mc_node_id, mask_reg, mask))
+   edac_dbg(0, "  DCSM%d[%d]=0x%08x reg: 0x%x\n",
+umc, cs, *mask, mask_reg);
+   }
+   }
+}
+
 /*
  * Function 2 Offset F10_DCSB0; read in the DCS Base and DCS Mask registers
  */
 static void read_dct_base_mask(struct amd64_pvt *pvt)
 {
-   int base_reg0, base_reg1, mask_reg0, mask_reg1, cs;
+   int cs;
 
prep_chip_selects(pvt);
 
-   if (pvt->umc) {
-   base_reg0 = get_umc_base(0) + UMCCH_BASE_ADDR;
-   base_reg1 = get_umc_base(1) + UMCCH_BASE_ADDR;
-   mask_reg0 = get_umc_base(0) + UMCCH_ADDR_MASK;
-   mask_reg1 = get_umc_base(1) + UMCCH_ADDR_MASK;
-   } else {
-   base_reg0 = DCSB0;
-   base_reg1 = DCSB1;
-   mask_reg0 = DCSM0;
-   mask_reg1 = DCSM1;
-   }
+   if (pvt->umc)
+   return read_umc_base_mask(pvt);
 
for_each_chip_select(cs, 0, pvt) {
-   int reg0   = base_reg0 + (cs * 4);
-   int reg1   = base_reg1 + (cs * 4);
+   int reg0   = DCSB0 + (cs * 4);
+   int reg1   = DCSB1 + (cs * 4);
u32 *base0 = >csels[0].csbases[cs];
u32 *base1 = >csels[1].csbases[cs];
 
-   if (pvt->umc) {
-   if (!amd_smn_read(pvt->mc_node_id, reg0, base0))
-   edac_dbg(0, "  DCSB0[%d]=0x%08x reg: 0x%x\n",
-cs, *base0, reg0);
-
-   if (!amd_smn_read(pvt->mc_node_id, reg1, base1))
-   edac_dbg(0, "  

[PATCH v3 6/8] EDAC/amd64: Cache secondary Chip Select registers

2019-08-21 Thread Ghannam, Yazen
From: Yazen Ghannam 

AMD Family 17h systems have a set of secondary Chip Select Base
Addresses and Address Masks. These do not represent unique Chip
Selects, rather they are used in conjunction with the primary
Chip Select registers in certain use cases.

Cache these secondary Chip Select registers for future use.

Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190709215643.171078-7-yazen.ghan...@amd.com

v2->v3:
* No change.

v1->v2:
* No change.

 drivers/edac/amd64_edac.c | 23 ---
 drivers/edac/amd64_edac.h |  4 
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index de141de7b2e5..26ce48fcaf00 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -946,34 +946,51 @@ static void prep_chip_selects(struct amd64_pvt *pvt)
 
 static void read_umc_base_mask(struct amd64_pvt *pvt)
 {
-   u32 umc_base_reg, umc_mask_reg;
-   u32 base_reg, mask_reg;
-   u32 *base, *mask;
+   u32 umc_base_reg, umc_base_reg_sec;
+   u32 umc_mask_reg, umc_mask_reg_sec;
+   u32 base_reg, base_reg_sec;
+   u32 mask_reg, mask_reg_sec;
+   u32 *base, *base_sec;
+   u32 *mask, *mask_sec;
int cs, umc;
 
for_each_umc(umc) {
umc_base_reg = get_umc_base(umc) + UMCCH_BASE_ADDR;
+   umc_base_reg_sec = get_umc_base(umc) + UMCCH_BASE_ADDR_SEC;
 
for_each_chip_select(cs, umc, pvt) {
base = >csels[umc].csbases[cs];
+   base_sec = >csels[umc].csbases_sec[cs];
 
base_reg = umc_base_reg + (cs * 4);
+   base_reg_sec = umc_base_reg_sec + (cs * 4);
 
if (!amd_smn_read(pvt->mc_node_id, base_reg, base))
edac_dbg(0, "  DCSB%d[%d]=0x%08x reg: 0x%x\n",
 umc, cs, *base, base_reg);
+
+   if (!amd_smn_read(pvt->mc_node_id, base_reg_sec, 
base_sec))
+   edac_dbg(0, "DCSB_SEC%d[%d]=0x%08x reg: 
0x%x\n",
+umc, cs, *base_sec, base_reg_sec);
}
 
umc_mask_reg = get_umc_base(umc) + UMCCH_ADDR_MASK;
+   umc_mask_reg_sec = get_umc_base(umc) + UMCCH_ADDR_MASK_SEC;
 
for_each_chip_select_mask(cs, umc, pvt) {
mask = >csels[umc].csmasks[cs];
+   mask_sec = >csels[umc].csmasks_sec[cs];
 
mask_reg = umc_mask_reg + (cs * 4);
+   mask_reg_sec = umc_mask_reg_sec + (cs * 4);
 
if (!amd_smn_read(pvt->mc_node_id, mask_reg, mask))
edac_dbg(0, "  DCSM%d[%d]=0x%08x reg: 0x%x\n",
 umc, cs, *mask, mask_reg);
+
+   if (!amd_smn_read(pvt->mc_node_id, mask_reg_sec, 
mask_sec))
+   edac_dbg(0, "DCSM_SEC%d[%d]=0x%08x reg: 
0x%x\n",
+umc, cs, *mask_sec, mask_reg_sec);
}
}
 }
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 4dce6a2ac75f..68f12de6e654 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -259,7 +259,9 @@
 
 /* UMC CH register offsets */
 #define UMCCH_BASE_ADDR0x0
+#define UMCCH_BASE_ADDR_SEC0x10
 #define UMCCH_ADDR_MASK0x20
+#define UMCCH_ADDR_MASK_SEC0x28
 #define UMCCH_ADDR_CFG 0x30
 #define UMCCH_DIMM_CFG 0x80
 #define UMCCH_UMC_CFG  0x100
@@ -312,9 +314,11 @@ struct dram_range {
 /* A DCT chip selects collection */
 struct chip_select {
u32 csbases[NUM_CHIPSELECTS];
+   u32 csbases_sec[NUM_CHIPSELECTS];
u8 b_cnt;
 
u32 csmasks[NUM_CHIPSELECTS];
+   u32 csmasks_sec[NUM_CHIPSELECTS];
u8 m_cnt;
 };
 
-- 
2.17.1



RE: [PATCH v2 2/7] EDAC/amd64: Recognize DRAM device type with EDAC_CTL_CAP

2019-08-19 Thread Ghannam, Yazen
> -Original Message-
> From: linux-edac-ow...@vger.kernel.org  On 
> Behalf Of Borislav Petkov
> Sent: Friday, August 2, 2019 2:42 AM
> To: Ghannam, Yazen 
> Cc: linux-e...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v2 2/7] EDAC/amd64: Recognize DRAM device type with 
> EDAC_CTL_CAP
> 
> On Tue, Jul 09, 2019 at 09:56:55PM +, Ghannam, Yazen wrote:
> > From: Yazen Ghannam 
> >
> > AMD Family 17h systems support x4 and x16 DRAM devices. However, the
> > device type is not checked when setting EDAC_CTL_CAP.
> >
> > Set the appropriate EDAC_CTL_CAP flag based on the device type.
> >
> > Fixes: 2d09d8f301f5 ("EDAC, amd64: Determine EDAC MC capabilities on 
> > Fam17h")
> 
> This is better: a patch which fixes a previous patch and is simple,
> small and clear. That you can tag with Fixes: just fine.
> 
> > Signed-off-by: Yazen Ghannam 
> > ---
> > Link:
> > https://lkml.kernel.org/r/20190531234501.32826-4-yazen.ghan...@amd.com
> >
> > v1->v2:
> > * No change.
> >
> >  drivers/edac/amd64_edac.c | 13 ++---
> >  1 file changed, 10 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> > index dd60cf5a3d96..125d6e2a828e 100644
> > --- a/drivers/edac/amd64_edac.c
> > +++ b/drivers/edac/amd64_edac.c
> > @@ -3150,12 +3150,15 @@ static bool ecc_enabled(struct pci_dev *F3, u16 nid)
> >  static inline void
> >  f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt 
> > *pvt)
> >  {
> > -   u8 i, ecc_en = 1, cpk_en = 1;
> > +   u8 i, ecc_en = 1, cpk_en = 1, dev_x4 = 1, dev_x16 = 1;
> >
> > for_each_umc(i) {
> > if (pvt->umc[i].sdp_ctrl & UMC_SDP_INIT) {
> > ecc_en &= !!(pvt->umc[i].umc_cap_hi & UMC_ECC_ENABLED);
> > cpk_en &= !!(pvt->umc[i].umc_cap_hi & 
> > UMC_ECC_CHIPKILL_CAP);
> > +
> > +   dev_x4 &= !!(pvt->umc[i].dimm_cfg & BIT(6));
> > +   dev_x16 &= !!(pvt->umc[i].dimm_cfg & BIT(7));
> 
> Are those bits mutually exclusive?
> 
> I.e., so that you can do:
> 
>   if (dev_x4)
>   mci->edac_ctl_cap |= EDAC_FLAG_S4ECD4ED;
>   else
>   mci->edac_ctl_cap |= EDAC_FLAG_S16ECD16ED;
> 
> ?
> 

I don't think so. I believe they can both be zero. I'll verify and make the 
change if they are mutually exclusive.

Thanks,
Yazen



RE: [PATCH v2 1/7] EDAC/amd64: Support more than two controllers for chip selects handling

2019-08-19 Thread Ghannam, Yazen
> -Original Message-
> From: Borislav Petkov 
> Sent: Friday, August 2, 2019 1:50 AM
> To: Ghannam, Yazen 
> Cc: linux-e...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v2 1/7] EDAC/amd64: Support more than two controllers for 
> chip selects handling
> 
> On Tue, Jul 09, 2019 at 09:56:54PM +, Ghannam, Yazen wrote:
> > From: Yazen Ghannam 
> >
> > The struct chip_select array that's used for saving chip select bases
> > and masks is fixed at length of two. There should be one struct
> > chip_select for each controller, so this array should be increased to
> > support systems that may have more than two controllers.
> >
> > Increase the size of the struct chip_select array to eight, which is the
> > largest number of controllers per die currently supported on AMD
> > systems.
> >
> > Fix number of DIMMs and Chip Select bases/masks on Family17h, because AMD
> > Family 17h systems support 2 DIMMs, 4 CS bases, and 2 CS masks per
> > channel.
> >
> > Also, carve out the Family 17h+ reading of the bases/masks into a
> > separate function. This effectively reverts the original bases/masks
> > reading code to before Family 17h support was added.
> >
> > This is a second version of a commit that was reverted.
> >
> > Fixes: 07ed82ef93d6 ("EDAC, amd64: Add Fam17h debug output")
> > Fixes: 8de9930a4618 ("Revert "EDAC/amd64: Support more than two controllers 
> > for chip select handling"")
> 
> I'm not sure about those Fixes: tags you're slapping everywhere. First
> of all, 8de9930a4618 is a revert so how can this be fixing a revert? If
> anything, it should be fixing the original commit
> 
>   0a227af521d6 ("EDAC/amd64: Support more than two controllers for chip 
> select handling")
> 
> which tried the more-than-2-memory-controllers thing.
> 
> But, it is not really a fix for that commit but a second attempt at it.
> Which is not really a fix but hw enablement.
> 
> So I'm dropping those tags here. If you want them in stable, pls
> backport them properly and test them on the respective stable kernels
> before sending them to stable.
> 

Okay, no problem.

Should I drop the Fixes tags on any other of the patches in this set?

Thanks,
Yazen


RE: [PATCH v2 0/7] AMD64 EDAC fixes

2019-08-15 Thread Ghannam, Yazen
> -Original Message-
> From: linux-edac-ow...@vger.kernel.org  On 
> Behalf Of Borislav Petkov
> Sent: Friday, August 2, 2019 9:46 AM
> To: Ghannam, Yazen 
> Cc: linux-e...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v2 0/7] AMD64 EDAC fixes
> 
...
> 
> So this still has this confusing reporting of unpopulated nodes:
> 
> [4.291774] EDAC MC1: Giving out device to module amd64_edac controller 
> F17h: DEV :00:19.3 (INTERRUPT)
> [4.292021] EDAC DEBUG: ecc_enabled: Node 2: No enabled UMCs.
> [4.292231] EDAC amd64: Node 2: DRAM ECC disabled.
> [4.292405] EDAC amd64: ECC disabled in the BIOS or no ECC capability, 
> module will not load.
> [4.292859] EDAC DEBUG: ecc_enabled: Node 3: No enabled UMCs.
> [4.292963] EDAC amd64: Node 3: DRAM ECC disabled.
> [4.293063] EDAC amd64: ECC disabled in the BIOS or no ECC capability, 
> module will not load.
> [4.293347] AMD64 EDAC driver v3.5.0
> 
> which needs fixing.
> 

Yes, I agree. I was planning to do a fix in a separate set. Is that okay? Or 
should I add it here?

> Regardless, still not good enough. The snowy owl box I have here has 16
> GB:
> 
> $ head -n1 /proc/meminfo
> MemTotal:   15715328 kB
> 
> and yet
> 
> [4.282251] EDAC MC: UMC0 chip selects:
> [4.282348] EDAC DEBUG: f17_addr_mask_to_cs_size: CS0 DIMM0 AddrMasks:
> [4.282455] EDAC DEBUG: f17_addr_mask_to_cs_size:   Original AddrMask: 
> 0x1fe
> [4.282592] EDAC DEBUG: f17_addr_mask_to_cs_size:   Deinterleaved 
> AddrMask: 0x1fe
> [4.282732] EDAC DEBUG: f17_addr_mask_to_cs_size: CS1 DIMM0 AddrMasks:
> [4.282839] EDAC DEBUG: f17_addr_mask_to_cs_size:   Original AddrMask: 
> 0x1fe
> [4.283060] EDAC DEBUG: f17_addr_mask_to_cs_size:   Deinterleaved 
> AddrMask: 0x1fe
> [4.283286] EDAC amd64: MC: 0:  8191MB 1:  8191MB
>  ^
> 
> [4.283456] EDAC amd64: MC: 2: 0MB 3: 0MB
> 
> ...
> 
> [4.285379] EDAC MC: UMC1 chip selects:
> [4.285476] EDAC DEBUG: f17_addr_mask_to_cs_size: CS0 DIMM0 AddrMasks:
> [4.285583] EDAC DEBUG: f17_addr_mask_to_cs_size:   Original AddrMask: 
> 0x1fe
> [4.285721] EDAC DEBUG: f17_addr_mask_to_cs_size:   Deinterleaved 
> AddrMask: 0x1fe
> [4.285860] EDAC DEBUG: f17_addr_mask_to_cs_size: CS1 DIMM0 AddrMasks:
> [4.285967] EDAC DEBUG: f17_addr_mask_to_cs_size:   Original AddrMask: 
> 0x1fe
> [4.286105] EDAC DEBUG: f17_addr_mask_to_cs_size:   Deinterleaved 
> AddrMask: 0x1fe
> [4.286244] EDAC amd64: MC: 0:  8191MB 1:  8191MB
>  ^
> 
> [4.286345] EDAC amd64: MC: 2: 0MB 3: 0MB
> 
> which shows 4 chip selects x 8Gb = 32G.
> 
> So something's still wrong. Before the patchset it says:
> 
> EDAC MC: UMC0 chip selects:
> EDAC amd64: MC: 0:  8192MB 1: 0MB
> ...
> EDAC MC: UMC1 chip selects:
> EDAC amd64: MC: 0:  8192MB 1: 0MB
> 
> which is the correct output.
> 

Can you please send me the full kernel log and dmidecode output?

Thanks,
Yazen


RE: [PATCHv3 0/6] CPPC optional registers AMD support

2019-07-15 Thread Ghannam, Yazen
> -Original Message-
> From: Peter Zijlstra 
> Sent: Saturday, July 13, 2019 5:46 AM
> To: Natarajan, Janakarajan 
> Cc: linux-a...@vger.kernel.org; linux-kernel@vger.kernel.org; 
> linux...@vger.kernel.org; de...@acpica.org; Rafael J . Wysocki
> ; Len Brown ; Viresh Kumar 
> ; Robert Moore
> ; Erik Schmauss ; Ghannam, 
> Yazen 
> Subject: Re: [PATCHv3 0/6] CPPC optional registers AMD support
> 
> On Wed, Jul 10, 2019 at 06:37:09PM +, Natarajan, Janakarajan wrote:
> > CPPC (Collaborative Processor Performance Control) offers optional
> > registers which can be used to tune the system based on energy and/or
> > performance requirements.
> >
> > Newer AMD processors (>= Family 17h) add support for a subset of these
> > optional CPPC registers, based on ACPI v6.1.
> >
> > The following are the supported CPPC registers for which sysfs entries
> > are created:
> > * enable(NEW)
> > * max_perf  (NEW)
> > * min_perf  (NEW)
> > * energy_perf
> > * lowest_perf
> > * nominal_perf
> > * desired_perf  (NEW)
> > * feedback_ctrs
> > * auto_sel_enable   (NEW)
> > * lowest_nonlinear_perf
> >
> > First, update cppc_acpi to create sysfs entries only when the optional
> > registers are known to be supported.
> >
> > Next, a new CPUFreq driver is introduced to enable the OSPM and the 
> > userspace
> > to access the newly supported registers through sysfs entries found in
> > /sys/devices/system/cpu/cpu/amd_cpufreq/.
> >
> > This new CPUFreq driver can only be used by providing a module parameter,
> > amd_cpufreq.cppc_enable=1.
> >
> > The purpose of exposing the registers via the amd-cpufreq sysfs entries is 
> > to
> > allow the userspace to:
> > * Tweak the values to fit its workload.
> > * Apply a profile from AMD's optimization guides.
> 
> So in general I think it is a huge mistake to expose all that to
> userspace. Before you know it, there's tools that actually rely on it,
> and then inhibit the kernel from doing anything sane with it.
> 

Okay, makes sense.

Is there any way to expose a sysfs interface and make it explicitly 
"experimental"? Maybe putting it in Documentation/ABI/testing/?

Or do you think it's just not worth it?

> > Profiles will be documented in the performance/optimization guides.
> 
> I don't think userspace can really do anything sane with this; it lacks
> much if not all useful information.
> 
> > Note:
> > * AMD systems will not have a policy applied in the kernel at this time.
> 
> And why the heck not? We're trying to move all cpufreq into the
> scheduler and have only a single governor, namely schedutil -- yes,
> we're still stuck with legacy, and we're still working on performance
> parity in some cases, but I really hope to get rid of all other cpufreq
> governors eventually.
> 

Because this is new to AMD systems, we didn't want to enforce a default policy.

We figured that exposing the CPPC interface would be a good way to decouple 
policy from the kernel and let users experiment/tune their systems, like using 
the userspace governor. And if some pattern emerged then we could make that a 
default policy in the kernel (for AMD or in general).

But you're saying we should focus more on working with the schedutil governor, 
correct? Do you think there's still a use for a userspace governor?

> And if you look at schedutil (schedutil_cpu_util in specific) then
> you'll see it is already prepared for CPPC and currently only held back
> by the generic cpufreq interface.
> 
> It currently only sets desired freq, it has information for
> min/guaranteed, and once we get thermal intergrated we might have
> sensible data for max freq too.
> 

Will do.

> > TODO:
> > * Create a linux userspace tool that will help users generate a CPPC profile
> >   for their target workload.
> 
> Basically a big fat NAK for this approach to cpufreq.
> 

Is that for exposing the sysfs interface, having a stub driver, or both?

Would it be better to have a cpufreq driver that implements some policy rather 
than just providing the sysfs interface?

> > * Create a general CPPC policy in the kernel.
> 
> We already have that, sorta.

Right, but it seems to still be focused on CPU frequency rather than abstract 
performance like how CPPC is defined.

This is another reason for exposing the CPPC interface directly. We'll give 
users the ability to interact with the platform, using CPPC, without having to 
follow the CPUFREQ paradigm.

Do you think this is doable? Or should we always have some kernel interaction 
because of the scheduler, etc.?

Thanks,
Yazen


[PATCH v2 5/7] EDAC/amd64: Decode syndrome before translating address

2019-07-09 Thread Ghannam, Yazen
From: Yazen Ghannam 

AMD Family 17h systems currently require address translation in order to
report the system address of a DRAM ECC error. This is currently done
before decoding the syndrome information. The syndrome information does
not depend on the address translation, so the proper EDAC csrow/channel
reporting can function without the address. However, the syndrome
information will not be decoded if the address translation fails.

Decode the syndrome information before doing the address translation.
The syndrome information is architecturally defined in MCA_SYND and can
be considered robust. The address translation is system-specific and may
fail on newer systems without proper updates to the translation
algorithm.

Fixes: 713ad54675fd ("EDAC, amd64: Define and register UMC error decode 
function")
Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190531234501.32826-7-yazen.ghan...@amd.com

v1->v2:
* No change.

 drivers/edac/amd64_edac.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index f0424c10cac0..4058b24b8e04 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2567,13 +2567,6 @@ static void decode_umc_error(int node_id, struct mce *m)
 
err.channel = find_umc_channel(m);
 
-   if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, err.channel, 
_addr)) {
-   err.err_code = ERR_NORM_ADDR;
-   goto log_error;
-   }
-
-   error_address_to_page_and_offset(sys_addr, );
-
if (!(m->status & MCI_STATUS_SYNDV)) {
err.err_code = ERR_SYND;
goto log_error;
@@ -2590,6 +2583,13 @@ static void decode_umc_error(int node_id, struct mce *m)
 
err.csrow = m->synd & 0x7;
 
+   if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, err.channel, 
_addr)) {
+   err.err_code = ERR_NORM_ADDR;
+   goto log_error;
+   }
+
+   error_address_to_page_and_offset(sys_addr, );
+
 log_error:
__log_ecc_error(mci, , ecc_type);
 }
-- 
2.17.1



[PATCH v2 6/7] EDAC/amd64: Cache secondary Chip Select registers

2019-07-09 Thread Ghannam, Yazen
From: Yazen Ghannam 

AMD Family 17h systems have a set of secondary Chip Select Base
Addresses and Address Masks. These do not represent unique Chip
Selects, rather they are used in conjunction with the primary
Chip Select registers in certain use cases.

Cache these secondary Chip Select registers for future use.

Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190531234501.32826-8-yazen.ghan...@amd.com

v1->v2:
* No change.

 drivers/edac/amd64_edac.c | 23 ---
 drivers/edac/amd64_edac.h |  4 
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 4058b24b8e04..006417cb79dc 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -943,34 +943,51 @@ static void prep_chip_selects(struct amd64_pvt *pvt)
 
 static void read_umc_base_mask(struct amd64_pvt *pvt)
 {
-   u32 umc_base_reg, umc_mask_reg;
-   u32 base_reg, mask_reg;
-   u32 *base, *mask;
+   u32 umc_base_reg, umc_base_reg_sec;
+   u32 umc_mask_reg, umc_mask_reg_sec;
+   u32 base_reg, base_reg_sec;
+   u32 mask_reg, mask_reg_sec;
+   u32 *base, *base_sec;
+   u32 *mask, *mask_sec;
int cs, umc;
 
for_each_umc(umc) {
umc_base_reg = get_umc_base(umc) + UMCCH_BASE_ADDR;
+   umc_base_reg_sec = get_umc_base(umc) + UMCCH_BASE_ADDR_SEC;
 
for_each_chip_select(cs, umc, pvt) {
base = >csels[umc].csbases[cs];
+   base_sec = >csels[umc].csbases_sec[cs];
 
base_reg = umc_base_reg + (cs * 4);
+   base_reg_sec = umc_base_reg_sec + (cs * 4);
 
if (!amd_smn_read(pvt->mc_node_id, base_reg, base))
edac_dbg(0, "  DCSB%d[%d]=0x%08x reg: 0x%x\n",
 umc, cs, *base, base_reg);
+
+   if (!amd_smn_read(pvt->mc_node_id, base_reg_sec, 
base_sec))
+   edac_dbg(0, "DCSB_SEC%d[%d]=0x%08x reg: 
0x%x\n",
+umc, cs, *base_sec, base_reg_sec);
}
 
umc_mask_reg = get_umc_base(umc) + UMCCH_ADDR_MASK;
+   umc_mask_reg_sec = get_umc_base(umc) + UMCCH_ADDR_MASK_SEC;
 
for_each_chip_select_mask(cs, umc, pvt) {
mask = >csels[umc].csmasks[cs];
+   mask_sec = >csels[umc].csmasks_sec[cs];
 
mask_reg = umc_mask_reg + (cs * 4);
+   mask_reg_sec = umc_mask_reg_sec + (cs * 4);
 
if (!amd_smn_read(pvt->mc_node_id, mask_reg, mask))
edac_dbg(0, "  DCSM%d[%d]=0x%08x reg: 0x%x\n",
 umc, cs, *mask, mask_reg);
+
+   if (!amd_smn_read(pvt->mc_node_id, mask_reg_sec, 
mask_sec))
+   edac_dbg(0, "DCSM_SEC%d[%d]=0x%08x reg: 
0x%x\n",
+umc, cs, *mask_sec, mask_reg_sec);
}
}
 }
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 4dce6a2ac75f..68f12de6e654 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -259,7 +259,9 @@
 
 /* UMC CH register offsets */
 #define UMCCH_BASE_ADDR0x0
+#define UMCCH_BASE_ADDR_SEC0x10
 #define UMCCH_ADDR_MASK0x20
+#define UMCCH_ADDR_MASK_SEC0x28
 #define UMCCH_ADDR_CFG 0x30
 #define UMCCH_DIMM_CFG 0x80
 #define UMCCH_UMC_CFG  0x100
@@ -312,9 +314,11 @@ struct dram_range {
 /* A DCT chip selects collection */
 struct chip_select {
u32 csbases[NUM_CHIPSELECTS];
+   u32 csbases_sec[NUM_CHIPSELECTS];
u8 b_cnt;
 
u32 csmasks[NUM_CHIPSELECTS];
+   u32 csmasks_sec[NUM_CHIPSELECTS];
u8 m_cnt;
 };
 
-- 
2.17.1



[PATCH v2 1/7] EDAC/amd64: Support more than two controllers for chip selects handling

2019-07-09 Thread Ghannam, Yazen
From: Yazen Ghannam 

The struct chip_select array that's used for saving chip select bases
and masks is fixed at length of two. There should be one struct
chip_select for each controller, so this array should be increased to
support systems that may have more than two controllers.

Increase the size of the struct chip_select array to eight, which is the
largest number of controllers per die currently supported on AMD
systems.

Fix number of DIMMs and Chip Select bases/masks on Family17h, because AMD
Family 17h systems support 2 DIMMs, 4 CS bases, and 2 CS masks per
channel.

Also, carve out the Family 17h+ reading of the bases/masks into a
separate function. This effectively reverts the original bases/masks
reading code to before Family 17h support was added.

This is a second version of a commit that was reverted.

Fixes: 07ed82ef93d6 ("EDAC, amd64: Add Fam17h debug output")
Fixes: 8de9930a4618 ("Revert "EDAC/amd64: Support more than two controllers for 
chip select handling"")
Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190531234501.32826-2-yazen.ghan...@amd.com
https://lkml.kernel.org/r/20190531234501.32826-3-yazen.ghan...@amd.com

v1->v2:
* Patches 1 and 2 squashed together.

 drivers/edac/amd64_edac.c | 123 +-
 drivers/edac/amd64_edac.h |   5 +-
 2 files changed, 71 insertions(+), 57 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 873437be86d9..dd60cf5a3d96 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -810,7 +810,7 @@ static void debug_display_dimm_sizes_df(struct amd64_pvt 
*pvt, u8 ctrl)
 
edac_printk(KERN_DEBUG, EDAC_MC, "UMC%d chip selects:\n", ctrl);
 
-   for (dimm = 0; dimm < 4; dimm++) {
+   for (dimm = 0; dimm < 2; dimm++) {
size0 = 0;
cs0 = dimm * 2;
 
@@ -942,89 +942,102 @@ static void prep_chip_selects(struct amd64_pvt *pvt)
} else if (pvt->fam == 0x15 && pvt->model == 0x30) {
pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4;
pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2;
+   } else if (pvt->fam >= 0x17) {
+   int umc;
+
+   for_each_umc(umc) {
+   pvt->csels[umc].b_cnt = 4;
+   pvt->csels[umc].m_cnt = 2;
+   }
+
} else {
pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8;
pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 4;
}
 }
 
+static void read_umc_base_mask(struct amd64_pvt *pvt)
+{
+   u32 umc_base_reg, umc_mask_reg;
+   u32 base_reg, mask_reg;
+   u32 *base, *mask;
+   int cs, umc;
+
+   for_each_umc(umc) {
+   umc_base_reg = get_umc_base(umc) + UMCCH_BASE_ADDR;
+
+   for_each_chip_select(cs, umc, pvt) {
+   base = >csels[umc].csbases[cs];
+
+   base_reg = umc_base_reg + (cs * 4);
+
+   if (!amd_smn_read(pvt->mc_node_id, base_reg, base))
+   edac_dbg(0, "  DCSB%d[%d]=0x%08x reg: 0x%x\n",
+umc, cs, *base, base_reg);
+   }
+
+   umc_mask_reg = get_umc_base(umc) + UMCCH_ADDR_MASK;
+
+   for_each_chip_select_mask(cs, umc, pvt) {
+   mask = >csels[umc].csmasks[cs];
+
+   mask_reg = umc_mask_reg + (cs * 4);
+
+   if (!amd_smn_read(pvt->mc_node_id, mask_reg, mask))
+   edac_dbg(0, "  DCSM%d[%d]=0x%08x reg: 0x%x\n",
+umc, cs, *mask, mask_reg);
+   }
+   }
+}
+
 /*
  * Function 2 Offset F10_DCSB0; read in the DCS Base and DCS Mask registers
  */
 static void read_dct_base_mask(struct amd64_pvt *pvt)
 {
-   int base_reg0, base_reg1, mask_reg0, mask_reg1, cs;
+   int cs;
 
prep_chip_selects(pvt);
 
-   if (pvt->umc) {
-   base_reg0 = get_umc_base(0) + UMCCH_BASE_ADDR;
-   base_reg1 = get_umc_base(1) + UMCCH_BASE_ADDR;
-   mask_reg0 = get_umc_base(0) + UMCCH_ADDR_MASK;
-   mask_reg1 = get_umc_base(1) + UMCCH_ADDR_MASK;
-   } else {
-   base_reg0 = DCSB0;
-   base_reg1 = DCSB1;
-   mask_reg0 = DCSM0;
-   mask_reg1 = DCSM1;
-   }
+   if (pvt->umc)
+   return read_umc_base_mask(pvt);
 
for_each_chip_select(cs, 0, pvt) {
-   int reg0   = base_reg0 + (cs * 4);
-   int reg1   = base_reg1 + (cs * 4);
+   int reg0   = DCSB0 + (cs * 4);
+   int reg1   = DCSB1 + (cs * 4);
u32 *base0 = >csels[0].csbases[cs];
u32 *base1 = >csels[1].csbases[cs];
 
-   if (pvt->umc) {
-   if (!amd_smn_read(pvt->mc_node_id, reg0, base0))
-   edac_dbg(0, "  

[PATCH v2 7/7] EDAC/amd64: Support Asymmetric Dual-Rank DIMMs

2019-07-09 Thread Ghannam, Yazen
From: Yazen Ghannam 

Future AMD systems will support "Asymmetric" Dual-Rank DIMMs. These are
DIMMs were the ranks are of different sizes.

The even rank will use the Primary Even Chip Select registers and the
odd rank will use the Secondary Odd Chip Select registers.

Recognize if a Secondary Odd Chip Select is being used. Use the
Secondary Odd Address Mask when calculating the chip select size.

Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190531234501.32826-9-yazen.ghan...@amd.com

v1->v2:
* No change.

 drivers/edac/amd64_edac.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 006417cb79dc..6c284a4f980c 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -790,6 +790,9 @@ static void debug_dump_dramcfg_low(struct amd64_pvt *pvt, 
u32 dclr, int chan)
 
 #define CS_EVEN_PRIMARYBIT(0)
 #define CS_ODD_PRIMARY BIT(1)
+#define CS_ODD_SECONDARY   BIT(2)
+
+#define csrow_sec_enabled(i, dct, pvt) ((pvt)->csels[(dct)].csbases_sec[(i)] & 
DCSB_CS_ENABLE)
 
 static int f17_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt)
 {
@@ -801,6 +804,10 @@ static int f17_get_cs_mode(int dimm, u8 ctrl, struct 
amd64_pvt *pvt)
if (csrow_enabled(2 * dimm + 1, ctrl, pvt))
cs_mode |= CS_ODD_PRIMARY;
 
+   /* Asymmetric Dual-Rank DIMM support. */
+   if (csrow_sec_enabled(2 * dimm + 1, ctrl, pvt))
+   cs_mode |= CS_ODD_SECONDARY;
+
return cs_mode;
 }
 
@@ -1590,7 +1597,11 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt 
*pvt, u8 umc,
 */
dimm = csrow_nr >> 1;
 
-   addr_mask_orig = pvt->csels[umc].csmasks[dimm];
+   /* Asymmetric Dual-Rank DIMM support. */
+   if (cs_mode & CS_ODD_SECONDARY)
+   addr_mask_orig = pvt->csels[umc].csmasks_sec[dimm];
+   else
+   addr_mask_orig = pvt->csels[umc].csmasks[dimm];
 
/*
 * The number of zero bits in the mask is equal to the number of bits
-- 
2.17.1



[PATCH v2 4/7] EDAC/amd64: Find Chip Select memory size using Address Mask

2019-07-09 Thread Ghannam, Yazen
From: Yazen Ghannam 

Chip Select memory size reporting on AMD Family 17h was recently fixed
in order to account for interleaving. However, the current method is not
robust.

The Chip Select Address Mask can be used to find the memory size. There
are a few cases.

1) For single-rank, use the address mask as the size.

2) For dual-rank non-interleaved, use the address mask divided by 2 as
the size.

3) For dual-rank interleaved, do #2 but "de-interleave" the address mask
first.

Always "de-interleave" the address mask in order to simplify the code
flow. Bit mask manipulation is necessary to check for interleaving, so
just go ahead do the de-interleaving. In the non-interleaved case, the
original and de-interleaved address masks will be the same.

To de-interleave the mask, count the number of zero bits in the middle
of the mask and swap them with the most significant bits.

For example,
Original=0x9FE, De-interleaved=0x3FE

Fixes: fc00c6a41638 ("EDAC/amd64: Adjust printed chip select sizes when 
interleaved")
Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190531234501.32826-6-yazen.ghan...@amd.com

v1->v2:
* No change.

 drivers/edac/amd64_edac.c | 107 ++
 1 file changed, 63 insertions(+), 44 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index d0926b181c7c..f0424c10cac0 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -788,51 +788,36 @@ static void debug_dump_dramcfg_low(struct amd64_pvt *pvt, 
u32 dclr, int chan)
 (dclr & BIT(15)) ?  "yes" : "no");
 }
 
-/*
- * The Address Mask should be a contiguous set of bits in the non-interleaved
- * case. So to check for CS interleaving, find the most- and least-significant
- * bits of the mask, generate a contiguous bitmask, and compare the two.
- */
-static bool f17_cs_interleaved(struct amd64_pvt *pvt, u8 ctrl, int cs)
+#define CS_EVEN_PRIMARYBIT(0)
+#define CS_ODD_PRIMARY BIT(1)
+
+static int f17_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt)
 {
-   u32 mask = pvt->csels[ctrl].csmasks[cs >> 1];
-   u32 msb = fls(mask) - 1, lsb = ffs(mask) - 1;
-   u32 test_mask = GENMASK(msb, lsb);
+   int cs_mode = 0;
+
+   if (csrow_enabled(2 * dimm, ctrl, pvt))
+   cs_mode |= CS_EVEN_PRIMARY;
 
-   edac_dbg(1, "mask=0x%08x test_mask=0x%08x\n", mask, test_mask);
+   if (csrow_enabled(2 * dimm + 1, ctrl, pvt))
+   cs_mode |= CS_ODD_PRIMARY;
 
-   return mask ^ test_mask;
+   return cs_mode;
 }
 
 static void debug_display_dimm_sizes_df(struct amd64_pvt *pvt, u8 ctrl)
 {
-   int dimm, size0, size1, cs0, cs1;
+   int dimm, size0, size1, cs0, cs1, cs_mode;
 
edac_printk(KERN_DEBUG, EDAC_MC, "UMC%d chip selects:\n", ctrl);
 
for (dimm = 0; dimm < 2; dimm++) {
-   size0 = 0;
cs0 = dimm * 2;
-
-   if (csrow_enabled(cs0, ctrl, pvt))
-   size0 = pvt->ops->dbam_to_cs(pvt, ctrl, 0, cs0);
-
-   size1 = 0;
cs1 = dimm * 2 + 1;
 
-   if (csrow_enabled(cs1, ctrl, pvt)) {
-   /*
-* CS interleaving is only supported if both CSes have
-* the same amount of memory. Because they are
-* interleaved, it will look like both CSes have the
-* full amount of memory. Save the size for both as
-* half the amount we found on CS0, if interleaved.
-*/
-   if (f17_cs_interleaved(pvt, ctrl, cs1))
-   size1 = size0 = (size0 >> 1);
-   else
-   size1 = pvt->ops->dbam_to_cs(pvt, ctrl, 0, cs1);
-   }
+   cs_mode = f17_get_cs_mode(dimm, ctrl, pvt);
+
+   size0 = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs0);
+   size1 = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs1);
 
amd64_info(EDAC_MC ": %d: %5dMB %d: %5dMB\n",
cs0,size0,
@@ -1569,18 +1554,50 @@ static int f16_dbam_to_chip_select(struct amd64_pvt 
*pvt, u8 dct,
return ddr3_cs_size(cs_mode, false);
 }
 
-static int f17_base_addr_to_cs_size(struct amd64_pvt *pvt, u8 umc,
+static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
unsigned int cs_mode, int csrow_nr)
 {
-   u32 base_addr = pvt->csels[umc].csbases[csrow_nr];
+   u32 addr_mask_orig, addr_mask_deinterleaved;
+   u32 msb, weight, num_zero_bits;
+   int dimm, dual_rank, size = 0;
 
-   /*  Each mask is used for every two base addresses. */
-   u32 addr_mask = pvt->csels[umc].csmasks[csrow_nr >> 1];
+   if (!cs_mode)
+   return size;
 
-   /*  Register [31:1] = Address [39:9]. Size is in kBs here. */
- 

[PATCH v2 2/7] EDAC/amd64: Recognize DRAM device type with EDAC_CTL_CAP

2019-07-09 Thread Ghannam, Yazen
From: Yazen Ghannam 

AMD Family 17h systems support x4 and x16 DRAM devices. However, the
device type is not checked when setting EDAC_CTL_CAP.

Set the appropriate EDAC_CTL_CAP flag based on the device type.

Fixes: 2d09d8f301f5 ("EDAC, amd64: Determine EDAC MC capabilities on Fam17h")
Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190531234501.32826-4-yazen.ghan...@amd.com

v1->v2:
* No change.

 drivers/edac/amd64_edac.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index dd60cf5a3d96..125d6e2a828e 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3150,12 +3150,15 @@ static bool ecc_enabled(struct pci_dev *F3, u16 nid)
 static inline void
 f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt)
 {
-   u8 i, ecc_en = 1, cpk_en = 1;
+   u8 i, ecc_en = 1, cpk_en = 1, dev_x4 = 1, dev_x16 = 1;
 
for_each_umc(i) {
if (pvt->umc[i].sdp_ctrl & UMC_SDP_INIT) {
ecc_en &= !!(pvt->umc[i].umc_cap_hi & UMC_ECC_ENABLED);
cpk_en &= !!(pvt->umc[i].umc_cap_hi & 
UMC_ECC_CHIPKILL_CAP);
+
+   dev_x4 &= !!(pvt->umc[i].dimm_cfg & BIT(6));
+   dev_x16 &= !!(pvt->umc[i].dimm_cfg & BIT(7));
}
}
 
@@ -3163,8 +3166,12 @@ f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, 
struct amd64_pvt *pvt)
if (ecc_en) {
mci->edac_ctl_cap |= EDAC_FLAG_SECDED;
 
-   if (cpk_en)
-   mci->edac_ctl_cap |= EDAC_FLAG_S4ECD4ED;
+   if (cpk_en) {
+   if (dev_x4)
+   mci->edac_ctl_cap |= EDAC_FLAG_S4ECD4ED;
+   else if (dev_x16)
+   mci->edac_ctl_cap |= EDAC_FLAG_S16ECD16ED;
+   }
}
 }
 
-- 
2.17.1



[PATCH v2 3/7] EDAC/amd64: Initialize DIMM info for systems with more than two channels

2019-07-09 Thread Ghannam, Yazen
From: Yazen Ghannam 

Currently, the DIMM info for AMD Family 17h systems is initialized in
init_csrows(). This function is shared with legacy systems, and it has a
limit of two channel support.

This prevents initialization of the DIMM info for a number of ranks, so
there will be missing ranks in the EDAC sysfs.

Create a new init_csrows_df() for Family17h+ and revert init_csrows()
back to pre-Family17h support.

Loop over all channels in the new function in order to support systems
with more than two channels.

Fixes: bdcee7747f5c ("EDAC/amd64: Support more than two Unified Memory 
Controllers")
Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190531234501.32826-5-yazen.ghan...@amd.com

v1->v2:
* No change.

 drivers/edac/amd64_edac.c | 63 ++-
 1 file changed, 49 insertions(+), 14 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 125d6e2a828e..d0926b181c7c 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2837,6 +2837,46 @@ static u32 get_csrow_nr_pages(struct amd64_pvt *pvt, u8 
dct, int csrow_nr_orig)
return nr_pages;
 }
 
+static int init_csrows_df(struct mem_ctl_info *mci)
+{
+   struct amd64_pvt *pvt = mci->pvt_info;
+   enum edac_type edac_mode = EDAC_NONE;
+   enum dev_type dev_type = DEV_UNKNOWN;
+   struct dimm_info *dimm;
+   int empty = 1;
+   u8 umc, cs;
+
+   if (mci->edac_ctl_cap & EDAC_FLAG_S16ECD16ED) {
+   edac_mode = EDAC_S16ECD16ED;
+   dev_type = DEV_X16;
+   } else if (mci->edac_ctl_cap & EDAC_FLAG_S4ECD4ED) {
+   edac_mode = EDAC_S4ECD4ED;
+   dev_type = DEV_X4;
+   } else if (mci->edac_ctl_cap & EDAC_FLAG_SECDED) {
+   edac_mode = EDAC_SECDED;
+   }
+
+   for_each_umc(umc) {
+   for_each_chip_select(cs, umc, pvt) {
+   if (!csrow_enabled(cs, umc, pvt))
+   continue;
+
+   empty = 0;
+   dimm = mci->csrows[cs]->channels[umc]->dimm;
+
+   edac_dbg(1, "MC node: %d, csrow: %d\n",
+   pvt->mc_node_id, cs);
+
+   dimm->nr_pages = get_csrow_nr_pages(pvt, umc, cs);
+   dimm->mtype = pvt->dram_type;
+   dimm->edac_mode = edac_mode;
+   dimm->dtype = dev_type;
+   }
+   }
+
+   return empty;
+}
+
 /*
  * Initialize the array of csrow attribute instances, based on the values
  * from pci config hardware registers.
@@ -2851,15 +2891,16 @@ static int init_csrows(struct mem_ctl_info *mci)
int nr_pages = 0;
u32 val;
 
-   if (!pvt->umc) {
-   amd64_read_pci_cfg(pvt->F3, NBCFG, );
+   if (pvt->umc)
+   return init_csrows_df(mci);
+
+   amd64_read_pci_cfg(pvt->F3, NBCFG, );
 
-   pvt->nbcfg = val;
+   pvt->nbcfg = val;
 
-   edac_dbg(0, "node %d, NBCFG=0x%08x[ChipKillEccCap: 
%d|DramEccEn: %d]\n",
-pvt->mc_node_id, val,
-!!(val & NBCFG_CHIPKILL), !!(val & NBCFG_ECC_ENABLE));
-   }
+   edac_dbg(0, "node %d, NBCFG=0x%08x[ChipKillEccCap: %d|DramEccEn: %d]\n",
+pvt->mc_node_id, val,
+!!(val & NBCFG_CHIPKILL), !!(val & NBCFG_ECC_ENABLE));
 
/*
 * We iterate over DCT0 here but we look at DCT1 in parallel, if needed.
@@ -2896,13 +2937,7 @@ static int init_csrows(struct mem_ctl_info *mci)
edac_dbg(1, "Total csrow%d pages: %u\n", i, nr_pages);
 
/* Determine DIMM ECC mode: */
-   if (pvt->umc) {
-   if (mci->edac_ctl_cap & EDAC_FLAG_S4ECD4ED)
-   edac_mode = EDAC_S4ECD4ED;
-   else if (mci->edac_ctl_cap & EDAC_FLAG_SECDED)
-   edac_mode = EDAC_SECDED;
-
-   } else if (pvt->nbcfg & NBCFG_ECC_ENABLE) {
+   if (pvt->nbcfg & NBCFG_ECC_ENABLE) {
edac_mode = (pvt->nbcfg & NBCFG_CHIPKILL)
? EDAC_S4ECD4ED
: EDAC_SECDED;
-- 
2.17.1



[PATCH v2 0/7] AMD64 EDAC fixes

2019-07-09 Thread Ghannam, Yazen
From: Yazen Ghannam 

Hi Boris,

This set contains a few fixes for some changes merged in v5.2. There
are also a couple of fixes for older issues. In addition, there are a
couple of patches to add support for Asymmetric Dual-Rank DIMMs.

Thanks,
Yazen

Link:
https://lkml.kernel.org/r/20190531234501.32826-1-yazen.ghan...@amd.com

v1->v2:
* Squash patches 1 and 2 together.

Yazen Ghannam (7):
  EDAC/amd64: Support more than two controllers for chip selects
handling
  EDAC/amd64: Recognize DRAM device type with EDAC_CTL_CAP
  EDAC/amd64: Initialize DIMM info for systems with more than two
channels
  EDAC/amd64: Find Chip Select memory size using Address Mask
  EDAC/amd64: Decode syndrome before translating address
  EDAC/amd64: Cache secondary Chip Select registers
  EDAC/amd64: Support Asymmetric Dual-Rank DIMMs

 drivers/edac/amd64_edac.c | 348 --
 drivers/edac/amd64_edac.h |   9 +-
 2 files changed, 232 insertions(+), 125 deletions(-)

-- 
2.17.1



RE: [PATCH 2/8] EDAC/amd64: Support more than two controllers for chip selects handling

2019-06-14 Thread Ghannam, Yazen
> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org  
> On Behalf Of Borislav Petkov
> Sent: Thursday, June 13, 2019 5:23 PM
> To: Ghannam, Yazen 
> Cc: linux-e...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 2/8] EDAC/amd64: Support more than two controllers for 
> chip selects handling
> 
> On Thu, Jun 13, 2019 at 08:58:16PM +, Ghannam, Yazen wrote:
> > The first patch is meant as a fix for existing systems, and this patch
> > is to add new functionality.
> >
> > I can merge them together if you think that's more appropriate.
> 
> Is it fixing such a critical issue that it needs to be a separate patch?
> If so, it should be CC:stable.
> 
> But I think we've survived without it just fine so why bother. But maybe
> there's an aspect I'm missing...
> 

No, you're right. It's not something critical.

I can squash these two patches together if you'd like.

Thanks,
Yazen


RE: [PATCH 1/8] EDAC/amd64: Fix number of DIMMs and Chip Select bases/masks on Family17h

2019-06-13 Thread Ghannam, Yazen
> -Original Message-
> From: Borislav Petkov 
> Sent: Thursday, June 13, 2019 8:58 AM
> To: Ghannam, Yazen 
> Cc: linux-e...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 1/8] EDAC/amd64: Fix number of DIMMs and Chip Select 
> bases/masks on Family17h
> 
> On Fri, May 31, 2019 at 11:45:11PM +, Ghannam, Yazen wrote:
> > From: Yazen Ghannam 
> >
> > ...because AMD Family 17h systems support 2 DIMMs, 4 CS bases, and 2 CS
> > masks per channel.
> >
> > Fixes: 07ed82ef93d6 ("EDAC, amd64: Add Fam17h debug output")
> > Signed-off-by: Yazen Ghannam 
> > ---
> >  drivers/edac/amd64_edac.c | 5 -
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> > index 873437be86d9..9fa2f205f05c 100644
> > --- a/drivers/edac/amd64_edac.c
> > +++ b/drivers/edac/amd64_edac.c
> > @@ -810,7 +810,7 @@ static void debug_display_dimm_sizes_df(struct 
> > amd64_pvt *pvt, u8 ctrl)
> >
> > edac_printk(KERN_DEBUG, EDAC_MC, "UMC%d chip selects:\n", ctrl);
> >
> > -   for (dimm = 0; dimm < 4; dimm++) {
> > +   for (dimm = 0; dimm < 2; dimm++) {
> > size0 = 0;
> > cs0 = dimm * 2;
> >
> > @@ -942,6 +942,9 @@ static void prep_chip_selects(struct amd64_pvt *pvt)
> > } else if (pvt->fam == 0x15 && pvt->model == 0x30) {
> > pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4;
> > pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2;
> > +   } else if (pvt->fam >= 0x17) {
> > +   pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4;
> > +   pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2;
> 
> I guess it is about time that function gets turned into a switch-case so
> that the assignment lines do not get duplicated.
> 

Okay, I'll write up a patch for that.

Do you have any tips on how to handle it? I'm thinking it may be tricky because 
of the ranges and multiple variables.

Thanks,
Yazen


RE: [PATCH 2/8] EDAC/amd64: Support more than two controllers for chip selects handling

2019-06-13 Thread Ghannam, Yazen
> -Original Message-
> From: Borislav Petkov 
> Sent: Thursday, June 13, 2019 9:17 AM
> To: Ghannam, Yazen 
> Cc: linux-e...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 2/8] EDAC/amd64: Support more than two controllers for 
> chip selects handling
> 
> On Fri, May 31, 2019 at 11:45:12PM +, Ghannam, Yazen wrote:
> > diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> > index 9fa2f205f05c..dd60cf5a3d96 100644
> > --- a/drivers/edac/amd64_edac.c
> > +++ b/drivers/edac/amd64_edac.c
> > @@ -943,91 +943,101 @@ static void prep_chip_selects(struct amd64_pvt *pvt)
> > pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4;
> > pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2;
> > } else if (pvt->fam >= 0x17) {
> > -   pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4;
> > -   pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2;
> > +   int umc;
> > +
> > +   for_each_umc(umc) {
> > +   pvt->csels[umc].b_cnt = 4;
> > +   pvt->csels[umc].m_cnt = 2;
> > +   }
> > +
> 
> What is the purpose of the previous commit if you're changing it here in
> the next one?
> 

The first patch is meant as a fix for existing systems, and this patch is to 
add new functionality.

I can merge them together if you think that's more appropriate.

Thanks,
Yazen


[PATCH v4 5/5] x86/MCE: Determine MCA banks' init state properly

2019-06-07 Thread Ghannam, Yazen
From: Yazen Ghannam 

The OS is expected to write all bits to MCA_CTL for each bank,
thus enabling error reporting in all banks. However, some banks
may be unused in which case the registers for such banks are
Read-as-Zero/Writes-Ignored. Also, the OS may avoid setting some control
bits because of quirks, etc.

A bank can be considered uninitialized if the MCA_CTL register returns
zero. This is because either the OS did not write anything or because
the hardware is enforcing RAZ/WI for the bank.

Set a bank's init value based on if the control bits are set or not in
hardware. Return an error code in the sysfs interface for uninitialized
banks.

Do a final bank init check in a separate function which is not part of
any user-controlled code flows. This is so a user may enable/disable a
bank during runtime without having to restart their system.

 [ bp: Massage a bit. Discover bank init state at boot. ]

Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190430203206.104163-6-yazen.ghan...@amd.com

v3->v4:
* Patch 5 version 3 was zapped.
* This is based on new patch from Boris which was based on Patch 6 version 3.
* Reworked new patch to fix sysfs issue with disabled banks.

v2->v3:
* No change.

v1->v2:
* No change.

 arch/x86/kernel/cpu/mce/core.c | 39 ++
 1 file changed, 39 insertions(+)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 6813712d8648..ae7723a942f6 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1490,6 +1490,11 @@ static void __mcheck_cpu_mce_banks_init(void)
for (i = 0; i < n_banks; i++) {
struct mce_bank *b = _banks[i];
 
+   /*
+* Init them all, __mcheck_cpu_apply_quirks() is going to apply
+* the required vendor quirks before
+* __mcheck_cpu_init_clear_banks() does the final bank setup.
+*/
b->ctl = -1ULL;
b->init = 1;
}
@@ -1562,6 +1567,33 @@ static void __mcheck_cpu_init_clear_banks(void)
}
 }
 
+/*
+ * Do a final check to see if there are any unused/RAZ banks.
+ *
+ * This must be done after the banks have been initialized and any quirks have
+ * been applied.
+ *
+ * Do not call this from any user-initialed flows, e.g. CPU hotplug or sysfs.
+ * Otherwise, a user who disables a bank will not be able to re-enable it
+ * without a system reboot.
+ */
+static void __mcheck_cpu_check_banks(void)
+{
+   struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
+   u64 msrval;
+   int i;
+
+   for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
+   struct mce_bank *b = _banks[i];
+
+   if (!b->init)
+   continue;
+
+   rdmsrl(msr_ops.ctl(i), msrval);
+   b->init = !!msrval;
+   }
+}
+
 /*
  * During IFU recovery Sandy Bridge -EP4S processors set the RIPV and
  * EIPV bits in MCG_STATUS to zero on the affected logical processor (SDM
@@ -1849,6 +1881,7 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c)
__mcheck_cpu_init_generic();
__mcheck_cpu_init_vendor(c);
__mcheck_cpu_init_clear_banks();
+   __mcheck_cpu_check_banks();
__mcheck_cpu_setup_timer();
 }
 
@@ -2085,6 +2118,9 @@ static ssize_t show_bank(struct device *s, struct 
device_attribute *attr,
 
b = _cpu(mce_banks_array, s->id)[bank];
 
+   if (!b->init)
+   return -ENODEV;
+
return sprintf(buf, "%llx\n", b->ctl);
 }
 
@@ -2103,6 +2139,9 @@ static ssize_t set_bank(struct device *s, struct 
device_attribute *attr,
 
b = _cpu(mce_banks_array, s->id)[bank];
 
+   if (!b->init)
+   return -ENODEV;
+
b->ctl = new;
mce_restart();
 
-- 
2.17.1



[PATCH v4 3/5] x86/MCE/AMD: Don't cache block addresses on SMCA systems

2019-06-07 Thread Ghannam, Yazen
From: Yazen Ghannam 

On legacy systems, the addresses of the MCA_MISC* registers need to be
recursively discovered based on a Block Pointer field in the registers.

On Scalable MCA systems, the register space is fixed, and particular
addresses can be derived by regular offsets for bank and register type.
This fixed address space includes the MCA_MISC* registers.

MCA_MISC0 is always available for each MCA bank. MCA_MISC1 through
MCA_MISC4 are considered available if MCA_MISC0[BlkPtr]=1.

Cache the value of MCA_MISC0[BlkPtr] for each bank and per CPU. This
needs to be done only during init. The values should be saved per CPU
to accommodate heterogeneous SMCA systems.

Redo smca_get_block_address() to directly return the block addresses.

Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190430203206.104163-4-yazen.ghan...@amd.com

v3->v4:
* No change.

v2->v3:
* Change name of new map variable to "smca_misc_banks_map".
* Use "BIT()" where appropriate.

v1->v2:
* No change.

 arch/x86/kernel/cpu/mce/amd.c | 73 ++-
 1 file changed, 37 insertions(+), 36 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index d904aafe6409..d4d6e4b7f9dc 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -101,11 +101,6 @@ static struct smca_bank_name smca_names[] = {
[SMCA_PCIE] = { "pcie", "PCI Express Unit" },
 };
 
-static u32 smca_bank_addrs[MAX_NR_BANKS][NR_BLOCKS] __ro_after_init =
-{
-   [0 ... MAX_NR_BANKS - 1] = { [0 ... NR_BLOCKS - 1] = -1 }
-};
-
 static const char *smca_get_name(enum smca_bank_types t)
 {
if (t >= N_SMCA_BANK_TYPES)
@@ -199,6 +194,9 @@ static char buf_mcatype[MAX_MCATYPE_NAME_LEN];
 static DEFINE_PER_CPU(struct threshold_bank **, threshold_banks);
 static DEFINE_PER_CPU(unsigned int, bank_map); /* see which banks are on */
 
+/* Map of banks that have more than MCA_MISC0 available. */
+static DEFINE_PER_CPU(u32, smca_misc_banks_map);
+
 static void amd_threshold_interrupt(void);
 static void amd_deferred_error_interrupt(void);
 
@@ -208,6 +206,28 @@ static void default_deferred_error_interrupt(void)
 }
 void (*deferred_error_int_vector)(void) = default_deferred_error_interrupt;
 
+static void smca_set_misc_banks_map(unsigned int bank, unsigned int cpu)
+{
+   u32 low, high;
+
+   /*
+* For SMCA enabled processors, BLKPTR field of the first MISC register
+* (MCx_MISC0) indicates presence of additional MISC regs set (MISC1-4).
+*/
+   if (rdmsr_safe(MSR_AMD64_SMCA_MCx_CONFIG(bank), , ))
+   return;
+
+   if (!(low & MCI_CONFIG_MCAX))
+   return;
+
+   if (rdmsr_safe(MSR_AMD64_SMCA_MCx_MISC(bank), , ))
+   return;
+
+   if (low & MASK_BLKPTR_LO)
+   per_cpu(smca_misc_banks_map, cpu) |= BIT(bank);
+
+}
+
 static void smca_configure(unsigned int bank, unsigned int cpu)
 {
unsigned int i, hwid_mcatype;
@@ -245,6 +265,8 @@ static void smca_configure(unsigned int bank, unsigned int 
cpu)
wrmsr(smca_config, low, high);
}
 
+   smca_set_misc_banks_map(bank, cpu);
+
/* Return early if this bank was already initialized. */
if (smca_banks[bank].hwid)
return;
@@ -455,42 +477,21 @@ static void deferred_error_interrupt_enable(struct 
cpuinfo_x86 *c)
wrmsr(MSR_CU_DEF_ERR, low, high);
 }
 
-static u32 smca_get_block_address(unsigned int bank, unsigned int block)
+static u32 smca_get_block_address(unsigned int bank, unsigned int block,
+ unsigned int cpu)
 {
-   u32 low, high;
-   u32 addr = 0;
-
-   if (smca_get_bank_type(bank) == SMCA_RESERVED)
-   return addr;
-
if (!block)
return MSR_AMD64_SMCA_MCx_MISC(bank);
 
-   /* Check our cache first: */
-   if (smca_bank_addrs[bank][block] != -1)
-   return smca_bank_addrs[bank][block];
-
-   /*
-* For SMCA enabled processors, BLKPTR field of the first MISC register
-* (MCx_MISC0) indicates presence of additional MISC regs set (MISC1-4).
-*/
-   if (rdmsr_safe(MSR_AMD64_SMCA_MCx_CONFIG(bank), , ))
-   goto out;
-
-   if (!(low & MCI_CONFIG_MCAX))
-   goto out;
-
-   if (!rdmsr_safe(MSR_AMD64_SMCA_MCx_MISC(bank), , ) &&
-   (low & MASK_BLKPTR_LO))
-   addr = MSR_AMD64_SMCA_MCx_MISCy(bank, block - 1);
+   if (!(per_cpu(smca_misc_banks_map, cpu) & BIT(bank)))
+   return 0;
 
-out:
-   smca_bank_addrs[bank][block] = addr;
-   return addr;
+   return MSR_AMD64_SMCA_MCx_MISCy(bank, block - 1);
 }
 
 static u32 get_block_address(u32 current_addr, u32 low, u32 high,
-unsigned int bank, unsigned int block)
+unsigned int bank, unsigned int block,
+unsigned int cpu)
 {
   

[PATCH v4 1/5] x86/MCE: Make struct mce_banks[] static

2019-06-07 Thread Ghannam, Yazen
From: Yazen Ghannam 

The struct mce_banks[] array is only used in mce/core.c so move its
definition there and make it static. Also, change the "init" field to
bool type.

Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190430203206.104163-2-yazen.ghan...@amd.com

v3->v4:
* No changes

v2->v3:
* No changes

v1->v2:
* No changes

 arch/x86/kernel/cpu/mce/core.c | 11 ++-
 arch/x86/kernel/cpu/mce/internal.h | 10 --
 2 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 282916f3b8d8..55bdbedde0b8 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -65,7 +65,16 @@ static DEFINE_MUTEX(mce_sysfs_mutex);
 
 DEFINE_PER_CPU(unsigned, mce_exception_count);
 
-struct mce_bank *mce_banks __read_mostly;
+#define ATTR_LEN   16
+/* One object for each MCE bank, shared by all CPUs */
+struct mce_bank {
+   u64 ctl;/* subevents to enable 
*/
+   boolinit;   /* initialise bank? */
+   struct device_attribute attr;   /* device attribute */
+   charattrname[ATTR_LEN]; /* attribute name */
+};
+
+static struct mce_bank *mce_banks __read_mostly;
 struct mce_vendor_flags mce_flags __read_mostly;
 
 struct mca_config mca_cfg __read_mostly = {
diff --git a/arch/x86/kernel/cpu/mce/internal.h 
b/arch/x86/kernel/cpu/mce/internal.h
index a34b55baa7aa..35b3e5c02c1c 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -22,17 +22,8 @@ enum severity_level {
 
 extern struct blocking_notifier_head x86_mce_decoder_chain;
 
-#define ATTR_LEN   16
 #define INITIAL_CHECK_INTERVAL 5 * 60 /* 5 minutes */
 
-/* One object for each MCE bank, shared by all CPUs */
-struct mce_bank {
-   u64 ctl;/* subevents to enable 
*/
-   unsigned char init; /* initialise bank? */
-   struct device_attribute attr;   /* device attribute */
-   charattrname[ATTR_LEN]; /* attribute name */
-};
-
 struct mce_evt_llist {
struct llist_node llnode;
struct mce mce;
@@ -47,7 +38,6 @@ struct llist_node *mce_gen_pool_prepare_records(void);
 extern int (*mce_severity)(struct mce *a, int tolerant, char **msg, bool 
is_excp);
 struct dentry *mce_get_debugfs_dir(void);
 
-extern struct mce_bank *mce_banks;
 extern mce_banks_t mce_banks_ce_disabled;
 
 #ifdef CONFIG_X86_MCE_INTEL
-- 
2.17.1



[PATCH v4 4/5] x86/MCE: Make the number of MCA banks a per-CPU variable

2019-06-07 Thread Ghannam, Yazen
From: Yazen Ghannam 

The number of MCA banks is provided per logical CPU. Historically, this
number has been the same across all CPUs, but this is not an
architectural guarantee. Future AMD systems may have MCA bank counts
that vary between logical CPUs in a system.

This issue was partially addressed in

  006c077041dc ("x86/mce: Handle varying MCA bank counts")

by allocating structures using the maximum number of MCA banks and by
saving the maximum MCA bank count in a system as the global count. This
means that some extra structures are allocated. Also, this means that
CPUs will spend more time in the #MC and other handlers checking extra
MCA banks.

Thus, define the number of MCA banks as a per-CPU variable.

 [ bp: Make mce_num_banks an unsigned int. ]

Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190430203206.104163-5-yazen.ghan...@amd.com

v3->v4:
* Include Boris' change.

v2->v3:
* Drop pr_debug() message.
* Change commit reference format.

v1->v2:
* Drop export of new variable and leave injector code as-is.
* Add "mce_" prefix to new "num_banks" variable.

 arch/x86/kernel/cpu/mce/amd.c  | 19 +++--
 arch/x86/kernel/cpu/mce/core.c | 45 +-
 arch/x86/kernel/cpu/mce/internal.h |  2 +-
 3 files changed, 36 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index d4d6e4b7f9dc..fb5c935af2c5 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -495,7 +495,7 @@ static u32 get_block_address(u32 current_addr, u32 low, u32 
high,
 {
u32 addr = 0, offset = 0;
 
-   if ((bank >= mca_cfg.banks) || (block >= NR_BLOCKS))
+   if ((bank >= per_cpu(mce_num_banks, cpu)) || (block >= NR_BLOCKS))
return addr;
 
if (mce_flags.smca)
@@ -627,11 +627,12 @@ void disable_err_thresholding(struct cpuinfo_x86 *c, 
unsigned int bank)
 /* cpu init entry point, called from mce.c with preempt off */
 void mce_amd_feature_init(struct cpuinfo_x86 *c)
 {
-   u32 low = 0, high = 0, address = 0;
unsigned int bank, block, cpu = smp_processor_id();
+   u32 low = 0, high = 0, address = 0;
int offset = -1;
 
-   for (bank = 0; bank < mca_cfg.banks; ++bank) {
+
+   for (bank = 0; bank < this_cpu_read(mce_num_banks); ++bank) {
if (mce_flags.smca)
smca_configure(bank, cpu);
 
@@ -976,7 +977,7 @@ static void amd_deferred_error_interrupt(void)
 {
unsigned int bank;
 
-   for (bank = 0; bank < mca_cfg.banks; ++bank)
+   for (bank = 0; bank < this_cpu_read(mce_num_banks); ++bank)
log_error_deferred(bank);
 }
 
@@ -1017,7 +1018,7 @@ static void amd_threshold_interrupt(void)
struct threshold_block *first_block = NULL, *block = NULL, *tmp = NULL;
unsigned int bank, cpu = smp_processor_id();
 
-   for (bank = 0; bank < mca_cfg.banks; ++bank) {
+   for (bank = 0; bank < this_cpu_read(mce_num_banks); ++bank) {
if (!(per_cpu(bank_map, cpu) & (1 << bank)))
continue;
 
@@ -1204,7 +1205,7 @@ static int allocate_threshold_blocks(unsigned int cpu, 
unsigned int bank,
u32 low, high;
int err;
 
-   if ((bank >= mca_cfg.banks) || (block >= NR_BLOCKS))
+   if ((bank >= per_cpu(mce_num_banks, cpu)) || (block >= NR_BLOCKS))
return 0;
 
if (rdmsr_safe_on_cpu(cpu, address, , ))
@@ -1438,7 +1439,7 @@ int mce_threshold_remove_device(unsigned int cpu)
 {
unsigned int bank;
 
-   for (bank = 0; bank < mca_cfg.banks; ++bank) {
+   for (bank = 0; bank < per_cpu(mce_num_banks, cpu); ++bank) {
if (!(per_cpu(bank_map, cpu) & (1 << bank)))
continue;
threshold_remove_bank(cpu, bank);
@@ -1459,14 +1460,14 @@ int mce_threshold_create_device(unsigned int cpu)
if (bp)
return 0;
 
-   bp = kcalloc(mca_cfg.banks, sizeof(struct threshold_bank *),
+   bp = kcalloc(per_cpu(mce_num_banks, cpu), sizeof(struct threshold_bank 
*),
 GFP_KERNEL);
if (!bp)
return -ENOMEM;
 
per_cpu(threshold_banks, cpu) = bp;
 
-   for (bank = 0; bank < mca_cfg.banks; ++bank) {
+   for (bank = 0; bank < per_cpu(mce_num_banks, cpu); ++bank) {
if (!(per_cpu(bank_map, cpu) & (1 << bank)))
continue;
err = threshold_create_bank(cpu, bank);
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index c505b10f912a..6813712d8648 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -65,6 +65,8 @@ static DEFINE_MUTEX(mce_sysfs_mutex);
 
 DEFINE_PER_CPU(unsigned, mce_exception_count);
 
+DEFINE_PER_CPU_READ_MOSTLY(unsigned int, mce_num_banks);
+
 struct mce_bank {
u64 ctl;/* subevents to enable 
*/
   

[PATCH v4 2/5] x86/MCE: Make mce_banks a per-CPU array

2019-06-07 Thread Ghannam, Yazen
From: Yazen Ghannam 

Current AMD systems have unique MCA banks per logical CPU even though
the type of the banks may all align to the same bank number. Each CPU
will have control of a set of MCA banks in the hardware and these are
not shared with other CPUs.

For example, bank 0 may be the Load-Store Unit on every logical CPU, but
each bank 0 is a unique structure in the hardware. In other words, there
isn't a *single* Load-Store Unit at MCA bank 0 that all logical CPUs
share.

This idea extends even to non-core MCA banks. For example, CPU0 and CPU4
may see a Unified Memory Controller at bank 15, but each CPU is actually
seeing a unique hardware structure that is not shared with other CPUs.

Because the MCA banks are all unique hardware structures, it would be
good to control them in a more granular way. For example, if there is a
known issue with the Floating Point Unit on CPU5 and a user wishes to
disable an error type on the Floating Point Unit, then it would be good
to do this only for CPU5 rather than all CPUs.

Also, future AMD systems may have heterogeneous MCA banks. Meaning
the bank numbers may not necessarily represent the same types between
CPUs. For example, bank 20 visible to CPU0 may be a Unified Memory
Controller and bank 20 visible to CPU4 may be a Coherent Slave. So
granular control will be even more necessary should the user wish to
control specific MCA banks.

Split the device attributes from struct mce_bank leaving only the MCA
bank control fields.

Make struct mce_banks[] per_cpu in order to have more granular control
over individual MCA banks in the hardware.

Allocate the device attributes statically based on the maximum number of
MCA banks supported. The sysfs interface will use as many as needed per
CPU. Currently, this is set to mca_cfg.banks, but will be changed to a
per_cpu bank count in a future patch.

Allocate the MCA control bits statically. This is in order to avoid
locking warnings when memory is allocated during secondary CPUs' init
sequences.

Also, remove the now unnecessary return values from
__mcheck_cpu_mce_banks_init() and __mcheck_cpu_cap_init().

Redo the sysfs store/show functions to handle the per_cpu mce_banks[].

 [ bp: s/mce_banks_percpu/mce_banks_array/g ]

[ Locking issue reported by ]
Reported-by: kernel test robot 
[ Locking issue fix suggested by ]
Suggested-by: Borislav Petkov 
Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190430203206.104163-3-yazen.ghan...@amd.com

v3->v4:
* Statically allocate all structures to avoid locking issues with
  secondary CPUs.

v2->v3:
* Keep old member alignment in struct mce_bank.
* Change "cpu" to "CPU" in modified comment.
* Use a local array pointer when doing multiple per_cpu accesses.

v1->v2:
* Change "struct mce_bank*" to "struct mce_bank *" in definition.

 arch/x86/kernel/cpu/mce/core.c | 76 +-
 1 file changed, 48 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 55bdbedde0b8..c505b10f912a 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -65,16 +65,21 @@ static DEFINE_MUTEX(mce_sysfs_mutex);
 
 DEFINE_PER_CPU(unsigned, mce_exception_count);
 
-#define ATTR_LEN   16
-/* One object for each MCE bank, shared by all CPUs */
 struct mce_bank {
u64 ctl;/* subevents to enable 
*/
boolinit;   /* initialise bank? */
+};
+static DEFINE_PER_CPU_READ_MOSTLY(struct mce_bank [MAX_NR_BANKS], 
mce_banks_array);
+
+#define ATTR_LEN   16
+/* One object for each MCE bank, shared by all CPUs */
+struct mce_bank_dev {
struct device_attribute attr;   /* device attribute */
charattrname[ATTR_LEN]; /* attribute name */
+   u8  bank;   /* bank number */
 };
+static struct mce_bank_dev mce_bank_devs[MAX_NR_BANKS];
 
-static struct mce_bank *mce_banks __read_mostly;
 struct mce_vendor_flags mce_flags __read_mostly;
 
 struct mca_config mca_cfg __read_mostly = {
@@ -684,6 +689,7 @@ DEFINE_PER_CPU(unsigned, mce_poll_count);
  */
 bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 {
+   struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
bool error_seen = false;
struct mce m;
int i;
@@ -1131,6 +1137,7 @@ static void __mc_scan_banks(struct mce *m, struct mce 
*final,
unsigned long *toclear, unsigned long *valid_banks,
int no_way_out, int *worst)
 {
+   struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
struct mca_config *cfg = _cfg;
int severity, i;
 
@@ -1472,27 +1479,23 @@ int mce_notify_irq(void)
 }
 EXPORT_SYMBOL_GPL(mce_notify_irq);
 
-static int __mcheck_cpu_mce_banks_init(void)
+static void __mcheck_cpu_mce_banks_init(void)
 {
+   struct 

[PATCH v4 0/5] Handle MCA banks in a per_cpu way

2019-06-07 Thread Ghannam, Yazen
From: Yazen Ghannam 

The focus of this patchset is define and use the MCA bank structures
and bank count per logical CPU.

With the exception of patch 4, this set applies to systems in production
today.

Patch 1:
Moves the declaration of struct mce_banks[] to the only file it's used.

Patch 2:
Splits struct mce_bank into a structure for fields common to MCA banks
on all CPUs and another structure that can be used per_cpu.

Patch 3:
Brings full circle the saga of the threshold block addresses on SMCA
systems. After taking a step back and reviewing the AMD documentation, I
think that this implimentation is the simplest and more robust way to
follow the spec.

Patch 4:
Saves and uses the MCA bank count as a per_cpu variable. This is to
support systems that have MCA bank counts that are different between
logical CPUs.

Patch 5:
Checks if an MCA banks is enabled after initialization.

Link:
https://lkml.kernel.org/r/20190430203206.104163-1-yazen.ghan...@amd.com

Thanks,
Yazen

Yazen Ghannam (5):
  x86/MCE: Make struct mce_banks[] static
  x86/MCE: Make mce_banks a per-CPU array
  x86/MCE/AMD: Don't cache block addresses on SMCA systems
  x86/MCE: Make the number of MCA banks a per-CPU variable
  x86/MCE: Determine MCA banks' init state properly

 arch/x86/kernel/cpu/mce/amd.c  |  92 +
 arch/x86/kernel/cpu/mce/core.c | 161 +
 arch/x86/kernel/cpu/mce/internal.h |  12 +--
 3 files changed, 165 insertions(+), 100 deletions(-)

-- 
2.17.1



RE: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

2019-06-07 Thread Ghannam, Yazen
> -Original Message-
> From: linux-edac-ow...@vger.kernel.org  On 
> Behalf Of Borislav Petkov
> Sent: Friday, June 7, 2019 11:59 AM
> To: Ghannam, Yazen 
> Cc: Luck, Tony ; linux-e...@vger.kernel.org; 
> linux-kernel@vger.kernel.org; x...@kernel.org
> Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in 
> hardware
> 
> On Fri, Jun 07, 2019 at 04:44:24PM +, Ghannam, Yazen wrote:
> > I have another version of this set that I can send today. It includes
> > the changes for this patch and also includes the fix for the locking
> > bug message.
> >
> > Should I send out the new version? Or do you want me to wait for any
> > fixes on top of the current version?
> 
> I don't understand - I think we said to feel free to rework it all by using
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=rc0%2b3-ras
> 
> and reworking the whole branch to accomodate the changes and then
> sending a whole new series...
> 

Right, I took that branch, squashed the locking fix into patch 2, fixed up the 
remaining patches, and then redid the last patch.

I plan to send the result as a v4 of this patchset with all the links, version 
history, etc. Is that what you mean? Sorry, if I misunderstood.

Thanks,
Yazen


RE: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

2019-06-07 Thread Ghannam, Yazen
> -Original Message-
> From: linux-edac-ow...@vger.kernel.org  On 
> Behalf Of Borislav Petkov
> Sent: Friday, June 7, 2019 11:37 AM
> To: Ghannam, Yazen 
> Cc: Luck, Tony ; linux-e...@vger.kernel.org; 
> linux-kernel@vger.kernel.org; x...@kernel.org
> Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in 
> hardware
> 
> On Fri, Jun 07, 2019 at 02:49:42PM +, Ghannam, Yazen wrote:
> > Would you mind if the function name stayed the same? The reason is
> > that MCA_CTL is written here, which is the "init" part, and MCA_STATUS
> > is cleared.
> >
> > I can use another name for the check, e.g. __mcheck_cpu_check_banks()
> > or __mcheck_cpu_banks_check_init().
> 
> Nevermind, leave it as is. I'll fix it up ontop. I don't like that
> "__mcheck_cpu_init" prefixing there which is a mouthful and should
> simply be "mce_cpu_" to denote that it is a function which is
> run on a CPU to setup stuff.
> 

Yeah, I agree.

I have another version of this set that I can send today. It includes the 
changes for this patch and also includes the fix for the locking bug message.

Should I send out the new version? Or do you want me to wait for any fixes on 
top of the current version?

Thanks,
Yazen


RE: [PATCH] tools/power turbostat: Make interval calculation per thread to reduce jitter

2019-06-07 Thread Ghannam, Yazen
> -Original Message-
> From: Ghannam, Yazen 
> Sent: Tuesday, April 23, 2019 12:53 PM
> To: Ghannam, Yazen ; linux...@vger.kernel.org; 
> len.br...@intel.com
> Cc: linux-kernel@vger.kernel.org; Len Brown 
> Subject: RE: [PATCH] tools/power turbostat: Make interval calculation per 
> thread to reduce jitter
> 
> > -Original Message-
> > From: linux-kernel-ow...@vger.kernel.org 
> >  On Behalf Of Ghannam, Yazen
> > Sent: Monday, March 25, 2019 12:33 PM
> > To: linux...@vger.kernel.org
> > Cc: Ghannam, Yazen ; linux-kernel@vger.kernel.org; 
> > l...@kernel.org
> > Subject: [PATCH] tools/power turbostat: Make interval calculation per 
> > thread to reduce jitter
> >
> > From: Yazen Ghannam 
> >
> > Turbostat currently normalizes TSC and other values by dividing by an
> > interval. This interval is the delta between the start of one global
> > (all counters on all CPUs) sampling and the start of another. However,
> > this introduces a lot of jitter into the data.
> >
> > In order to reduce jitter, the interval calculation should be based on
> > timestamps taken per thread and close to the start of the thread's
> > sampling.
> >
> > Define a per thread time value to hold the delta between samples taken
> > on the thread.
> >
> > Use the timestamp taken at the beginning of sampling to calculate the
> > delta.
> >
> > Move the thread's beginning timestamp to after the CPU migration to
> > avoid jitter due to the migration.
> >
> > Use the global time delta for the average time delta.
> >
> > Signed-off-by: Yazen Ghannam 
> > ---
> 
> Hi Len,
> 
> Any comments on this patch?
> 

Hi Len,

Just wanted to check in. Do you have any comments on this patch?

Thanks,
Yazen


RE: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

2019-06-07 Thread Ghannam, Yazen
> -Original Message-
> From: Borislav Petkov 
> Sent: Monday, May 27, 2019 6:29 PM
> To: Ghannam, Yazen 
> Cc: Luck, Tony ; linux-e...@vger.kernel.org; 
> linux-kernel@vger.kernel.org; x...@kernel.org
> Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in 
> hardware
> 
> 
> I guess the cleanest way to handle his properly would be to have a
> function called something like __mcheck_cpu_init_banks() which gets
> called in mcheck_cpu_init() after the quirks have run and then does the
> final poking of the banks and sets b->init properly.
> 
> __mcheck_cpu_init_clear_banks() should then be renamed to
> __mcheck_cpu_clear_banks() to denote that it only clears the banks and
> would only do:
> 
> if (!b->init)
> continue;
> 
> wrmsrl(msr_ops.ctl(i), b->ctl);
> wrmsrl(msr_ops.status(i), 0);
> 

Would you mind if the function name stayed the same? The reason is that MCA_CTL 
is written here, which is the "init" part, and MCA_STATUS is cleared.

I can use another name for the check, e.g. __mcheck_cpu_check_banks() or 
__mcheck_cpu_banks_check_init().

Thanks,
Yazen


[PATCH 6/8] EDAC/amd64: Decode syndrome before translating address

2019-05-31 Thread Ghannam, Yazen
From: Yazen Ghannam 

AMD Family 17h systems currently require address translation in order to
report the system address of a DRAM ECC error. This is currently done
before decoding the syndrome information. The syndrome information does
not depend on the address translation, so the proper EDAC csrow/channel
reporting can function without the address. However, the syndrome
information will not be decoded if the address translation fails.

Decode the syndrome information before doing the address translation.
The syndrome information is architecturally defined in MCA_SYND and can
be considered robust. The address translation is system-specific and may
fail on newer systems without proper updates to the translation
algorithm.

Fixes: 713ad54675fd ("EDAC, amd64: Define and register UMC error decode 
function")
Signed-off-by: Yazen Ghannam 
---
 drivers/edac/amd64_edac.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index f0424c10cac0..4058b24b8e04 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2567,13 +2567,6 @@ static void decode_umc_error(int node_id, struct mce *m)
 
err.channel = find_umc_channel(m);
 
-   if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, err.channel, 
_addr)) {
-   err.err_code = ERR_NORM_ADDR;
-   goto log_error;
-   }
-
-   error_address_to_page_and_offset(sys_addr, );
-
if (!(m->status & MCI_STATUS_SYNDV)) {
err.err_code = ERR_SYND;
goto log_error;
@@ -2590,6 +2583,13 @@ static void decode_umc_error(int node_id, struct mce *m)
 
err.csrow = m->synd & 0x7;
 
+   if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, err.channel, 
_addr)) {
+   err.err_code = ERR_NORM_ADDR;
+   goto log_error;
+   }
+
+   error_address_to_page_and_offset(sys_addr, );
+
 log_error:
__log_ecc_error(mci, , ecc_type);
 }
-- 
2.17.1



[PATCH 3/8] EDAC/amd64: Recognize DRAM device type with EDAC_CTL_CAP

2019-05-31 Thread Ghannam, Yazen
From: Yazen Ghannam 

AMD Family 17h systems support x4 and x16 DRAM devices. However, the
device type is not checked when setting EDAC_CTL_CAP.

Set the appropriate EDAC_CTL_CAP flag based on the device type.

Fixes: 2d09d8f301f5 ("EDAC, amd64: Determine EDAC MC capabilities on Fam17h")
Signed-off-by: Yazen Ghannam 
---
 drivers/edac/amd64_edac.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index dd60cf5a3d96..125d6e2a828e 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3150,12 +3150,15 @@ static bool ecc_enabled(struct pci_dev *F3, u16 nid)
 static inline void
 f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt)
 {
-   u8 i, ecc_en = 1, cpk_en = 1;
+   u8 i, ecc_en = 1, cpk_en = 1, dev_x4 = 1, dev_x16 = 1;
 
for_each_umc(i) {
if (pvt->umc[i].sdp_ctrl & UMC_SDP_INIT) {
ecc_en &= !!(pvt->umc[i].umc_cap_hi & UMC_ECC_ENABLED);
cpk_en &= !!(pvt->umc[i].umc_cap_hi & 
UMC_ECC_CHIPKILL_CAP);
+
+   dev_x4 &= !!(pvt->umc[i].dimm_cfg & BIT(6));
+   dev_x16 &= !!(pvt->umc[i].dimm_cfg & BIT(7));
}
}
 
@@ -3163,8 +3166,12 @@ f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, 
struct amd64_pvt *pvt)
if (ecc_en) {
mci->edac_ctl_cap |= EDAC_FLAG_SECDED;
 
-   if (cpk_en)
-   mci->edac_ctl_cap |= EDAC_FLAG_S4ECD4ED;
+   if (cpk_en) {
+   if (dev_x4)
+   mci->edac_ctl_cap |= EDAC_FLAG_S4ECD4ED;
+   else if (dev_x16)
+   mci->edac_ctl_cap |= EDAC_FLAG_S16ECD16ED;
+   }
}
 }
 
-- 
2.17.1



[PATCH 7/8] EDAC/amd64: Cache secondary Chip Select registers

2019-05-31 Thread Ghannam, Yazen
From: Yazen Ghannam 

AMD Family 17h systems have a set of secondary Chip Select Base
Addresses and Address Masks. These do not represent unique Chip
Selects, rather they are used in conjunction with the primary
Chip Select registers in certain use cases.

Cache these secondary Chip Select registers for future use.

Signed-off-by: Yazen Ghannam 
---
 drivers/edac/amd64_edac.c | 23 ---
 drivers/edac/amd64_edac.h |  4 
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 4058b24b8e04..006417cb79dc 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -943,34 +943,51 @@ static void prep_chip_selects(struct amd64_pvt *pvt)
 
 static void read_umc_base_mask(struct amd64_pvt *pvt)
 {
-   u32 umc_base_reg, umc_mask_reg;
-   u32 base_reg, mask_reg;
-   u32 *base, *mask;
+   u32 umc_base_reg, umc_base_reg_sec;
+   u32 umc_mask_reg, umc_mask_reg_sec;
+   u32 base_reg, base_reg_sec;
+   u32 mask_reg, mask_reg_sec;
+   u32 *base, *base_sec;
+   u32 *mask, *mask_sec;
int cs, umc;
 
for_each_umc(umc) {
umc_base_reg = get_umc_base(umc) + UMCCH_BASE_ADDR;
+   umc_base_reg_sec = get_umc_base(umc) + UMCCH_BASE_ADDR_SEC;
 
for_each_chip_select(cs, umc, pvt) {
base = >csels[umc].csbases[cs];
+   base_sec = >csels[umc].csbases_sec[cs];
 
base_reg = umc_base_reg + (cs * 4);
+   base_reg_sec = umc_base_reg_sec + (cs * 4);
 
if (!amd_smn_read(pvt->mc_node_id, base_reg, base))
edac_dbg(0, "  DCSB%d[%d]=0x%08x reg: 0x%x\n",
 umc, cs, *base, base_reg);
+
+   if (!amd_smn_read(pvt->mc_node_id, base_reg_sec, 
base_sec))
+   edac_dbg(0, "DCSB_SEC%d[%d]=0x%08x reg: 
0x%x\n",
+umc, cs, *base_sec, base_reg_sec);
}
 
umc_mask_reg = get_umc_base(umc) + UMCCH_ADDR_MASK;
+   umc_mask_reg_sec = get_umc_base(umc) + UMCCH_ADDR_MASK_SEC;
 
for_each_chip_select_mask(cs, umc, pvt) {
mask = >csels[umc].csmasks[cs];
+   mask_sec = >csels[umc].csmasks_sec[cs];
 
mask_reg = umc_mask_reg + (cs * 4);
+   mask_reg_sec = umc_mask_reg_sec + (cs * 4);
 
if (!amd_smn_read(pvt->mc_node_id, mask_reg, mask))
edac_dbg(0, "  DCSM%d[%d]=0x%08x reg: 0x%x\n",
 umc, cs, *mask, mask_reg);
+
+   if (!amd_smn_read(pvt->mc_node_id, mask_reg_sec, 
mask_sec))
+   edac_dbg(0, "DCSM_SEC%d[%d]=0x%08x reg: 
0x%x\n",
+umc, cs, *mask_sec, mask_reg_sec);
}
}
 }
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 4dce6a2ac75f..68f12de6e654 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -259,7 +259,9 @@
 
 /* UMC CH register offsets */
 #define UMCCH_BASE_ADDR0x0
+#define UMCCH_BASE_ADDR_SEC0x10
 #define UMCCH_ADDR_MASK0x20
+#define UMCCH_ADDR_MASK_SEC0x28
 #define UMCCH_ADDR_CFG 0x30
 #define UMCCH_DIMM_CFG 0x80
 #define UMCCH_UMC_CFG  0x100
@@ -312,9 +314,11 @@ struct dram_range {
 /* A DCT chip selects collection */
 struct chip_select {
u32 csbases[NUM_CHIPSELECTS];
+   u32 csbases_sec[NUM_CHIPSELECTS];
u8 b_cnt;
 
u32 csmasks[NUM_CHIPSELECTS];
+   u32 csmasks_sec[NUM_CHIPSELECTS];
u8 m_cnt;
 };
 
-- 
2.17.1



[PATCH 2/8] EDAC/amd64: Support more than two controllers for chip selects handling

2019-05-31 Thread Ghannam, Yazen
From: Yazen Ghannam 

The struct chip_select array that's used for saving chip select bases
and masks is fixed at length of two. There should be one struct
chip_select for each controller, so this array should be increased to
support systems that may have more than two controllers.

Increase the size of the struct chip_select array to eight, which is the
largest number of controllers per die currently supported on AMD
systems.

Also, carve out the Family 17h+ reading of the bases/masks into a
separate function. This effectively reverts the original bases/masks
reading code to before Family 17h support was added.

This is a second version of a commit that was reverted.

Fixes: 8de9930a4618 ("Revert "EDAC/amd64: Support more than two controllers for 
chip select handling"")
Signed-off-by: Yazen Ghannam 
---
 drivers/edac/amd64_edac.c | 122 +-
 drivers/edac/amd64_edac.h |   5 +-
 2 files changed, 69 insertions(+), 58 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 9fa2f205f05c..dd60cf5a3d96 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -943,91 +943,101 @@ static void prep_chip_selects(struct amd64_pvt *pvt)
pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4;
pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2;
} else if (pvt->fam >= 0x17) {
-   pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4;
-   pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2;
+   int umc;
+
+   for_each_umc(umc) {
+   pvt->csels[umc].b_cnt = 4;
+   pvt->csels[umc].m_cnt = 2;
+   }
+
} else {
pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8;
pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 4;
}
 }
 
+static void read_umc_base_mask(struct amd64_pvt *pvt)
+{
+   u32 umc_base_reg, umc_mask_reg;
+   u32 base_reg, mask_reg;
+   u32 *base, *mask;
+   int cs, umc;
+
+   for_each_umc(umc) {
+   umc_base_reg = get_umc_base(umc) + UMCCH_BASE_ADDR;
+
+   for_each_chip_select(cs, umc, pvt) {
+   base = >csels[umc].csbases[cs];
+
+   base_reg = umc_base_reg + (cs * 4);
+
+   if (!amd_smn_read(pvt->mc_node_id, base_reg, base))
+   edac_dbg(0, "  DCSB%d[%d]=0x%08x reg: 0x%x\n",
+umc, cs, *base, base_reg);
+   }
+
+   umc_mask_reg = get_umc_base(umc) + UMCCH_ADDR_MASK;
+
+   for_each_chip_select_mask(cs, umc, pvt) {
+   mask = >csels[umc].csmasks[cs];
+
+   mask_reg = umc_mask_reg + (cs * 4);
+
+   if (!amd_smn_read(pvt->mc_node_id, mask_reg, mask))
+   edac_dbg(0, "  DCSM%d[%d]=0x%08x reg: 0x%x\n",
+umc, cs, *mask, mask_reg);
+   }
+   }
+}
+
 /*
  * Function 2 Offset F10_DCSB0; read in the DCS Base and DCS Mask registers
  */
 static void read_dct_base_mask(struct amd64_pvt *pvt)
 {
-   int base_reg0, base_reg1, mask_reg0, mask_reg1, cs;
+   int cs;
 
prep_chip_selects(pvt);
 
-   if (pvt->umc) {
-   base_reg0 = get_umc_base(0) + UMCCH_BASE_ADDR;
-   base_reg1 = get_umc_base(1) + UMCCH_BASE_ADDR;
-   mask_reg0 = get_umc_base(0) + UMCCH_ADDR_MASK;
-   mask_reg1 = get_umc_base(1) + UMCCH_ADDR_MASK;
-   } else {
-   base_reg0 = DCSB0;
-   base_reg1 = DCSB1;
-   mask_reg0 = DCSM0;
-   mask_reg1 = DCSM1;
-   }
+   if (pvt->umc)
+   return read_umc_base_mask(pvt);
 
for_each_chip_select(cs, 0, pvt) {
-   int reg0   = base_reg0 + (cs * 4);
-   int reg1   = base_reg1 + (cs * 4);
+   int reg0   = DCSB0 + (cs * 4);
+   int reg1   = DCSB1 + (cs * 4);
u32 *base0 = >csels[0].csbases[cs];
u32 *base1 = >csels[1].csbases[cs];
 
-   if (pvt->umc) {
-   if (!amd_smn_read(pvt->mc_node_id, reg0, base0))
-   edac_dbg(0, "  DCSB0[%d]=0x%08x reg: 0x%x\n",
-cs, *base0, reg0);
-
-   if (!amd_smn_read(pvt->mc_node_id, reg1, base1))
-   edac_dbg(0, "  DCSB1[%d]=0x%08x reg: 0x%x\n",
-cs, *base1, reg1);
-   } else {
-   if (!amd64_read_dct_pci_cfg(pvt, 0, reg0, base0))
-   edac_dbg(0, "  DCSB0[%d]=0x%08x reg: F2x%x\n",
-cs, *base0, reg0);
+   if (!amd64_read_dct_pci_cfg(pvt, 0, reg0, base0))
+   edac_dbg(0, "  DCSB0[%d]=0x%08x reg: 

[PATCH 8/8] EDAC/amd64: Support Asymmetric Dual-Rank DIMMs

2019-05-31 Thread Ghannam, Yazen
From: Yazen Ghannam 

Future AMD systems will support "Asymmetric" Dual-Rank DIMMs. These are
DIMMs were the ranks are of different sizes.

The even rank will use the Primary Even Chip Select registers and the
odd rank will use the Secondary Odd Chip Select registers.

Recognize if a Secondary Odd Chip Select is being used. Use the
Secondary Odd Address Mask when calculating the chip select size.

Signed-off-by: Yazen Ghannam 
---
 drivers/edac/amd64_edac.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 006417cb79dc..6c284a4f980c 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -790,6 +790,9 @@ static void debug_dump_dramcfg_low(struct amd64_pvt *pvt, 
u32 dclr, int chan)
 
 #define CS_EVEN_PRIMARYBIT(0)
 #define CS_ODD_PRIMARY BIT(1)
+#define CS_ODD_SECONDARY   BIT(2)
+
+#define csrow_sec_enabled(i, dct, pvt) ((pvt)->csels[(dct)].csbases_sec[(i)] & 
DCSB_CS_ENABLE)
 
 static int f17_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt)
 {
@@ -801,6 +804,10 @@ static int f17_get_cs_mode(int dimm, u8 ctrl, struct 
amd64_pvt *pvt)
if (csrow_enabled(2 * dimm + 1, ctrl, pvt))
cs_mode |= CS_ODD_PRIMARY;
 
+   /* Asymmetric Dual-Rank DIMM support. */
+   if (csrow_sec_enabled(2 * dimm + 1, ctrl, pvt))
+   cs_mode |= CS_ODD_SECONDARY;
+
return cs_mode;
 }
 
@@ -1590,7 +1597,11 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt 
*pvt, u8 umc,
 */
dimm = csrow_nr >> 1;
 
-   addr_mask_orig = pvt->csels[umc].csmasks[dimm];
+   /* Asymmetric Dual-Rank DIMM support. */
+   if (cs_mode & CS_ODD_SECONDARY)
+   addr_mask_orig = pvt->csels[umc].csmasks_sec[dimm];
+   else
+   addr_mask_orig = pvt->csels[umc].csmasks[dimm];
 
/*
 * The number of zero bits in the mask is equal to the number of bits
-- 
2.17.1



[PATCH 4/8] EDAC/amd64: Initialize DIMM info for systems with more than two channels

2019-05-31 Thread Ghannam, Yazen
From: Yazen Ghannam 

Currently, the DIMM info for AMD Family 17h systems is initialized in
init_csrows(). This function is shared with legacy systems, and it has a
limit of two channel support.

This prevents initialization of the DIMM info for a number of ranks, so
there will be missing ranks in the EDAC sysfs.

Create a new init_csrows_df() for Family17h+ and revert init_csrows()
back to pre-Family17h support.

Loop over all channels in the new function in order to support systems
with more than two channels.

Fixes: bdcee7747f5c ("EDAC/amd64: Support more than two Unified Memory 
Controllers")
Signed-off-by: Yazen Ghannam 
---
 drivers/edac/amd64_edac.c | 63 ++-
 1 file changed, 49 insertions(+), 14 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 125d6e2a828e..d0926b181c7c 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2837,6 +2837,46 @@ static u32 get_csrow_nr_pages(struct amd64_pvt *pvt, u8 
dct, int csrow_nr_orig)
return nr_pages;
 }
 
+static int init_csrows_df(struct mem_ctl_info *mci)
+{
+   struct amd64_pvt *pvt = mci->pvt_info;
+   enum edac_type edac_mode = EDAC_NONE;
+   enum dev_type dev_type = DEV_UNKNOWN;
+   struct dimm_info *dimm;
+   int empty = 1;
+   u8 umc, cs;
+
+   if (mci->edac_ctl_cap & EDAC_FLAG_S16ECD16ED) {
+   edac_mode = EDAC_S16ECD16ED;
+   dev_type = DEV_X16;
+   } else if (mci->edac_ctl_cap & EDAC_FLAG_S4ECD4ED) {
+   edac_mode = EDAC_S4ECD4ED;
+   dev_type = DEV_X4;
+   } else if (mci->edac_ctl_cap & EDAC_FLAG_SECDED) {
+   edac_mode = EDAC_SECDED;
+   }
+
+   for_each_umc(umc) {
+   for_each_chip_select(cs, umc, pvt) {
+   if (!csrow_enabled(cs, umc, pvt))
+   continue;
+
+   empty = 0;
+   dimm = mci->csrows[cs]->channels[umc]->dimm;
+
+   edac_dbg(1, "MC node: %d, csrow: %d\n",
+   pvt->mc_node_id, cs);
+
+   dimm->nr_pages = get_csrow_nr_pages(pvt, umc, cs);
+   dimm->mtype = pvt->dram_type;
+   dimm->edac_mode = edac_mode;
+   dimm->dtype = dev_type;
+   }
+   }
+
+   return empty;
+}
+
 /*
  * Initialize the array of csrow attribute instances, based on the values
  * from pci config hardware registers.
@@ -2851,15 +2891,16 @@ static int init_csrows(struct mem_ctl_info *mci)
int nr_pages = 0;
u32 val;
 
-   if (!pvt->umc) {
-   amd64_read_pci_cfg(pvt->F3, NBCFG, );
+   if (pvt->umc)
+   return init_csrows_df(mci);
+
+   amd64_read_pci_cfg(pvt->F3, NBCFG, );
 
-   pvt->nbcfg = val;
+   pvt->nbcfg = val;
 
-   edac_dbg(0, "node %d, NBCFG=0x%08x[ChipKillEccCap: 
%d|DramEccEn: %d]\n",
-pvt->mc_node_id, val,
-!!(val & NBCFG_CHIPKILL), !!(val & NBCFG_ECC_ENABLE));
-   }
+   edac_dbg(0, "node %d, NBCFG=0x%08x[ChipKillEccCap: %d|DramEccEn: %d]\n",
+pvt->mc_node_id, val,
+!!(val & NBCFG_CHIPKILL), !!(val & NBCFG_ECC_ENABLE));
 
/*
 * We iterate over DCT0 here but we look at DCT1 in parallel, if needed.
@@ -2896,13 +2937,7 @@ static int init_csrows(struct mem_ctl_info *mci)
edac_dbg(1, "Total csrow%d pages: %u\n", i, nr_pages);
 
/* Determine DIMM ECC mode: */
-   if (pvt->umc) {
-   if (mci->edac_ctl_cap & EDAC_FLAG_S4ECD4ED)
-   edac_mode = EDAC_S4ECD4ED;
-   else if (mci->edac_ctl_cap & EDAC_FLAG_SECDED)
-   edac_mode = EDAC_SECDED;
-
-   } else if (pvt->nbcfg & NBCFG_ECC_ENABLE) {
+   if (pvt->nbcfg & NBCFG_ECC_ENABLE) {
edac_mode = (pvt->nbcfg & NBCFG_CHIPKILL)
? EDAC_S4ECD4ED
: EDAC_SECDED;
-- 
2.17.1



[PATCH 5/8] EDAC/amd64: Find Chip Select memory size using Address Mask

2019-05-31 Thread Ghannam, Yazen
From: Yazen Ghannam 

Chip Select memory size reporting on AMD Family 17h was recently fixed
in order to account for interleaving. However, the current method is not
robust.

The Chip Select Address Mask can be used to find the memory size. There
are a few cases.

1) For single-rank, use the address mask as the size.

2) For dual-rank non-interleaved, use the address mask divided by 2 as
the size.

3) For dual-rank interleaved, do #2 but "de-interleave" the address mask
first.

Always "de-interleave" the address mask in order to simplify the code
flow. Bit mask manipulation is necessary to check for interleaving, so
just go ahead do the de-interleaving. In the non-interleaved case, the
original and de-interleaved address masks will be the same.

To de-interleave the mask, count the number of zero bits in the middle
of the mask and swap them with the most significant bits.

For example,
Original=0x9FE, De-interleaved=0x3FE

Fixes: fc00c6a41638 ("EDAC/amd64: Adjust printed chip select sizes when 
interleaved")
Signed-off-by: Yazen Ghannam 
---
 drivers/edac/amd64_edac.c | 107 ++
 1 file changed, 63 insertions(+), 44 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index d0926b181c7c..f0424c10cac0 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -788,51 +788,36 @@ static void debug_dump_dramcfg_low(struct amd64_pvt *pvt, 
u32 dclr, int chan)
 (dclr & BIT(15)) ?  "yes" : "no");
 }
 
-/*
- * The Address Mask should be a contiguous set of bits in the non-interleaved
- * case. So to check for CS interleaving, find the most- and least-significant
- * bits of the mask, generate a contiguous bitmask, and compare the two.
- */
-static bool f17_cs_interleaved(struct amd64_pvt *pvt, u8 ctrl, int cs)
+#define CS_EVEN_PRIMARYBIT(0)
+#define CS_ODD_PRIMARY BIT(1)
+
+static int f17_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt)
 {
-   u32 mask = pvt->csels[ctrl].csmasks[cs >> 1];
-   u32 msb = fls(mask) - 1, lsb = ffs(mask) - 1;
-   u32 test_mask = GENMASK(msb, lsb);
+   int cs_mode = 0;
+
+   if (csrow_enabled(2 * dimm, ctrl, pvt))
+   cs_mode |= CS_EVEN_PRIMARY;
 
-   edac_dbg(1, "mask=0x%08x test_mask=0x%08x\n", mask, test_mask);
+   if (csrow_enabled(2 * dimm + 1, ctrl, pvt))
+   cs_mode |= CS_ODD_PRIMARY;
 
-   return mask ^ test_mask;
+   return cs_mode;
 }
 
 static void debug_display_dimm_sizes_df(struct amd64_pvt *pvt, u8 ctrl)
 {
-   int dimm, size0, size1, cs0, cs1;
+   int dimm, size0, size1, cs0, cs1, cs_mode;
 
edac_printk(KERN_DEBUG, EDAC_MC, "UMC%d chip selects:\n", ctrl);
 
for (dimm = 0; dimm < 2; dimm++) {
-   size0 = 0;
cs0 = dimm * 2;
-
-   if (csrow_enabled(cs0, ctrl, pvt))
-   size0 = pvt->ops->dbam_to_cs(pvt, ctrl, 0, cs0);
-
-   size1 = 0;
cs1 = dimm * 2 + 1;
 
-   if (csrow_enabled(cs1, ctrl, pvt)) {
-   /*
-* CS interleaving is only supported if both CSes have
-* the same amount of memory. Because they are
-* interleaved, it will look like both CSes have the
-* full amount of memory. Save the size for both as
-* half the amount we found on CS0, if interleaved.
-*/
-   if (f17_cs_interleaved(pvt, ctrl, cs1))
-   size1 = size0 = (size0 >> 1);
-   else
-   size1 = pvt->ops->dbam_to_cs(pvt, ctrl, 0, cs1);
-   }
+   cs_mode = f17_get_cs_mode(dimm, ctrl, pvt);
+
+   size0 = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs0);
+   size1 = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs1);
 
amd64_info(EDAC_MC ": %d: %5dMB %d: %5dMB\n",
cs0,size0,
@@ -1569,18 +1554,50 @@ static int f16_dbam_to_chip_select(struct amd64_pvt 
*pvt, u8 dct,
return ddr3_cs_size(cs_mode, false);
 }
 
-static int f17_base_addr_to_cs_size(struct amd64_pvt *pvt, u8 umc,
+static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
unsigned int cs_mode, int csrow_nr)
 {
-   u32 base_addr = pvt->csels[umc].csbases[csrow_nr];
+   u32 addr_mask_orig, addr_mask_deinterleaved;
+   u32 msb, weight, num_zero_bits;
+   int dimm, dual_rank, size = 0;
 
-   /*  Each mask is used for every two base addresses. */
-   u32 addr_mask = pvt->csels[umc].csmasks[csrow_nr >> 1];
+   if (!cs_mode)
+   return size;
 
-   /*  Register [31:1] = Address [39:9]. Size is in kBs here. */
-   u32 size = ((addr_mask >> 1) - (base_addr >> 1) + 1) >> 1;
+   dual_rank = !!(cs_mode & 

[PATCH 1/8] EDAC/amd64: Fix number of DIMMs and Chip Select bases/masks on Family17h

2019-05-31 Thread Ghannam, Yazen
From: Yazen Ghannam 

...because AMD Family 17h systems support 2 DIMMs, 4 CS bases, and 2 CS
masks per channel.

Fixes: 07ed82ef93d6 ("EDAC, amd64: Add Fam17h debug output")
Signed-off-by: Yazen Ghannam 
---
 drivers/edac/amd64_edac.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 873437be86d9..9fa2f205f05c 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -810,7 +810,7 @@ static void debug_display_dimm_sizes_df(struct amd64_pvt 
*pvt, u8 ctrl)
 
edac_printk(KERN_DEBUG, EDAC_MC, "UMC%d chip selects:\n", ctrl);
 
-   for (dimm = 0; dimm < 4; dimm++) {
+   for (dimm = 0; dimm < 2; dimm++) {
size0 = 0;
cs0 = dimm * 2;
 
@@ -942,6 +942,9 @@ static void prep_chip_selects(struct amd64_pvt *pvt)
} else if (pvt->fam == 0x15 && pvt->model == 0x30) {
pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4;
pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2;
+   } else if (pvt->fam >= 0x17) {
+   pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4;
+   pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2;
} else {
pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8;
pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 4;
-- 
2.17.1



[PATCH 0/8] AMD64 EDAC fixes for v5.2

2019-05-31 Thread Ghannam, Yazen
From: Yazen Ghannam 

Hi Boris,

This set contains a few fixes for some changes merged in v5.2. There
are also a couple of fixes for older issues. In addition, there are a
couple of patches to add support for Asymmetric Dual-Rank DIMMs.

Thanks,
Yazen

Yazen Ghannam (8):
  EDAC/amd64: Fix number of DIMMs and Chip Select bases/masks on
Family17h
  EDAC/amd64: Support more than two controllers for chip selects
handling
  EDAC/amd64: Recognize DRAM device type with EDAC_CTL_CAP
  EDAC/amd64: Initialize DIMM info for systems with more than two
channels
  EDAC/amd64: Find Chip Select memory size using Address Mask
  EDAC/amd64: Decode syndrome before translating address
  EDAC/amd64: Cache secondary Chip Select registers
  EDAC/amd64: Support Asymmetric Dual-Rank DIMMs

 drivers/edac/amd64_edac.c | 348 --
 drivers/edac/amd64_edac.h |   9 +-
 2 files changed, 232 insertions(+), 125 deletions(-)

-- 
2.17.1



RE: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

2019-05-23 Thread Ghannam, Yazen
> -Original Message-
> From: Borislav Petkov 
> Sent: Friday, May 17, 2019 3:02 PM
> To: Ghannam, Yazen 
> Cc: Luck, Tony ; linux-e...@vger.kernel.org; 
> linux-kernel@vger.kernel.org; x...@kernel.org
> Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in 
> hardware
> 
> 
> On Fri, May 17, 2019 at 07:49:10PM +, Ghannam, Yazen wrote:
> > > @@ -1569,7 +1575,13 @@ static void __mcheck_cpu_init_clear_banks(void)
> > >
> > > if (!b->init)
> > > continue;
> > > +
> > > +   /* Check if any bits are implemented in h/w */
> > > wrmsrl(msr_ops.ctl(i), b->ctl);
> > > +   rdmsrl(msr_ops.ctl(i), msrval);
> > > +
> > > +   b->init = !!msrval;
> > > +
> > Just a minor nit, but can we group the comment, RDMSR, and check
> > together? The WRMSR is part of normal operation and isn't tied to the
> > check.
> 
> Of course it is - that's the "throw all 1s at it" part :)
> 

I did a bit more testing and I noticed that writing "0" disables a bank with no 
way to reenable it.

For example:
1) Read bank10.
a) Succeeds; returns "fff".
2) Write "0" to bank10.
a) Succeeds; hardware register is set to "0".
b) Hardware register is checked, and b->init=0.
3) Read bank10.
a) Fails, because b->init=0.
4) Write non-zero value to bank10 to reenable it.
a) Fails, because b->init=0.
5) Reboot needed to reset bank.

Is that okay?

Thanks,
Yazen


RE: [PATCH] x86/MCE: Statically allocate mce_banks_array

2019-05-23 Thread Ghannam, Yazen
> -Original Message-
> From: Borislav Petkov 
> Sent: Thursday, May 23, 2019 3:28 PM
> To: Ghannam, Yazen 
> Cc: linux-e...@vger.kernel.org; linux-kernel@vger.kernel.org; 
> tony.l...@intel.com; x...@kernel.org
> Subject: Re: [PATCH] x86/MCE: Statically allocate mce_banks_array
> 
> 
> On Thu, May 23, 2019 at 03:03:55PM +, Ghannam, Yazen wrote:
> > From: Yazen Ghannam 
> >
> > The MCE control data is stored in an array of struct mce_banks. This
> > array has historically been shared by all CPUs and it was allocated
> > dynamically during the first CPU's init sequence.
> >
> > However, starting with
> >
> >   5b0883f5c7be ("x86/MCE: Make mce_banks a per-CPU array")
> >
> > the array was changed to become a per-CPU array. Each CPU would
> > dynamically allocate the array during its own init sequence.
> >
> > This seems benign expect when "Lock Debugging" config options are
> > enabled in which case the following message appears.
> >
> >   BUG: sleeping function called from invalid context at mm/slab.h:418
> >
> > The message appears during the secondary CPUs' init sequences. This seems
> > to be because these CPUs are in system_state=SYSTEM_SCHEDULING compared
> > to the primary CPU which is in system_state=SYSTEM_BOOTING.
> >
> > Allocate the mce_banks_array statically so that this issue can be
> > avoided.
> >
> > Also, remove the now unnecessary return values from
> > __mcheck_cpu_mce_banks_init() and __mcheck_cpu_cap_init().
> >
> > Fixes: 5b0883f5c7be ("x86/MCE: Make mce_banks a per-CPU array")
> > Reported-by: kernel test robot 
> > Suggested-by: Borislav Petkov 
> > Signed-off-by: Yazen Ghannam 
> > ---
> >  arch/x86/kernel/cpu/mce/core.c | 39 --
> >  1 file changed, 14 insertions(+), 25 deletions(-)
> 
> Can you rediff this patch against tip/master please?
> 
> It fixes a patch which is already in -rc1 so it needs to go first, into
> urgent, before your patchset.
> 

Sure, but which patch are you referring to?

This seems to fix a patch in the set in bp/rc0+3-ras.

Thanks,
Yazen


[PATCH] x86/MCE: Statically allocate mce_banks_array

2019-05-23 Thread Ghannam, Yazen
From: Yazen Ghannam 

The MCE control data is stored in an array of struct mce_banks. This
array has historically been shared by all CPUs and it was allocated
dynamically during the first CPU's init sequence.

However, starting with

5b0883f5c7be ("x86/MCE: Make mce_banks a per-CPU array")

the array was changed to become a per-CPU array. Each CPU would
dynamically allocate the array during its own init sequence.

This seems benign expect when "Lock Debugging" config options are
enabled in which case the following message appears.

BUG: sleeping function called from invalid context at mm/slab.h:418

The message appears during the secondary CPUs' init sequences. This seems
to be because these CPUs are in system_state=SYSTEM_SCHEDULING compared
to the primary CPU which is in system_state=SYSTEM_BOOTING.

Allocate the mce_banks_array statically so that this issue can be
avoided.

Also, remove the now unnecessary return values from
__mcheck_cpu_mce_banks_init() and __mcheck_cpu_cap_init().

Fixes: 5b0883f5c7be ("x86/MCE: Make mce_banks a per-CPU array")
Reported-by: kernel test robot 
Suggested-by: Borislav Petkov 
Signed-off-by: Yazen Ghannam 
---
 arch/x86/kernel/cpu/mce/core.c | 39 --
 1 file changed, 14 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 25e501a853cd..b8eebebbc2f8 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -70,7 +70,7 @@ struct mce_bank {
u64 ctl;/* subevents to enable 
*/
boolinit;   /* initialise bank? */
 };
-static DEFINE_PER_CPU_READ_MOSTLY(struct mce_bank *, mce_banks_array);
+static DEFINE_PER_CPU_READ_MOSTLY(struct mce_bank [MAX_NR_BANKS], 
mce_banks_array);
 
 #define ATTR_LEN   16
 /* One object for each MCE bank, shared by all CPUs */
@@ -690,7 +690,7 @@ DEFINE_PER_CPU(unsigned, mce_poll_count);
  */
 bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 {
-   struct mce_bank *mce_banks = this_cpu_read(mce_banks_array);
+   struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
bool error_seen = false;
struct mce m;
int i;
@@ -1138,7 +1138,7 @@ static void __mc_scan_banks(struct mce *m, struct mce 
*final,
unsigned long *toclear, unsigned long *valid_banks,
int no_way_out, int *worst)
 {
-   struct mce_bank *mce_banks = this_cpu_read(mce_banks_array);
+   struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
struct mca_config *cfg = _cfg;
int severity, i;
 
@@ -1480,16 +1480,12 @@ int mce_notify_irq(void)
 }
 EXPORT_SYMBOL_GPL(mce_notify_irq);
 
-static int __mcheck_cpu_mce_banks_init(void)
+static void __mcheck_cpu_mce_banks_init(void)
 {
+   struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
u8 n_banks = this_cpu_read(mce_num_banks);
-   struct mce_bank *mce_banks;
int i;
 
-   mce_banks = kcalloc(n_banks, sizeof(struct mce_bank), GFP_KERNEL);
-   if (!mce_banks)
-   return -ENOMEM;
-
for (i = 0; i < n_banks; i++) {
struct mce_bank *b = _banks[i];
 
@@ -1501,15 +1497,12 @@ static int __mcheck_cpu_mce_banks_init(void)
b->ctl = -1ULL;
b->init = 1;
}
-
-   per_cpu(mce_banks_array, smp_processor_id()) = mce_banks;
-   return 0;
 }
 
 /*
  * Initialize Machine Checks for a CPU.
  */
-static int __mcheck_cpu_cap_init(void)
+static void __mcheck_cpu_cap_init(void)
 {
u64 cap;
u8 b;
@@ -1526,11 +1519,7 @@ static int __mcheck_cpu_cap_init(void)
 
this_cpu_write(mce_num_banks, b);
 
-   if (!this_cpu_read(mce_banks_array)) {
-   int err = __mcheck_cpu_mce_banks_init();
-   if (err)
-   return err;
-   }
+   __mcheck_cpu_mce_banks_init();
 
/* Use accurate RIP reporting if available. */
if ((cap & MCG_EXT_P) && MCG_EXT_CNT(cap) >= 9)
@@ -1538,8 +1527,6 @@ static int __mcheck_cpu_cap_init(void)
 
if (cap & MCG_SER_P)
mca_cfg.ser = 1;
-
-   return 0;
 }
 
 static void __mcheck_cpu_init_generic(void)
@@ -1566,7 +1553,7 @@ static void __mcheck_cpu_init_generic(void)
 
 static void __mcheck_cpu_init_clear_banks(void)
 {
-   struct mce_bank *mce_banks = this_cpu_read(mce_banks_array);
+   struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
u64 msrval;
int i;
 
@@ -1617,7 +1604,7 @@ static void quirk_sandybridge_ifu(int bank, struct mce 
*m, struct pt_regs *regs)
 /* Add per CPU specific workarounds here */
 static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
 {
-   struct mce_bank *mce_banks = this_cpu_read(mce_banks_array);
+   struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
struct mca_config *cfg = 

RE: [PATCH v3 4/6] x86/MCE: Make number of MCA banks per_cpu

2019-05-22 Thread Ghannam, Yazen
> -Original Message-
> From: linux-edac-ow...@vger.kernel.org  On 
> Behalf Of Borislav Petkov
> Sent: Tuesday, May 21, 2019 6:09 PM
> To: Luck, Tony 
> Cc: Ghannam, Yazen ; linux-e...@vger.kernel.org; 
> linux-kernel@vger.kernel.org; x...@kernel.org
> Subject: Re: [PATCH v3 4/6] x86/MCE: Make number of MCA banks per_cpu
> 
> 
> On Tue, May 21, 2019 at 01:42:40PM -0700, Luck, Tony wrote:
> > On Tue, May 21, 2019 at 10:29:02PM +0200, Borislav Petkov wrote:
> > >
> > > Can we do instead:
> > >
> > > -static DEFINE_PER_CPU_READ_MOSTLY(struct mce_bank *, mce_banks_array);
> > > +static DEFINE_PER_CPU_READ_MOSTLY(struct mce_bank, 
> > > mce_banks_array[MAX_NR_BANKS]);
> > >
> > > which should be something like 9*32 = 288 bytes per CPU.
> > >
> >
> > Where did you get the "9" from?  struct mce_bank looks to
> > be over 50 bytes.
> 
> Patch 2/6 changes that:
> 
>  struct mce_bank {
> u64 ctl;/* subevents to 
> enable */
> boolinit;   /* initialise bank? */
> +};
> +static DEFINE_PER_CPU_READ_MOSTLY(struct mce_bank *, mce_banks_percpu);
> +
> +#define ATTR_LEN   16
> +/* One object for each MCE bank, shared by all CPUs */
> +struct mce_bank_dev {
> struct device_attribute attr;   /* device attribute */
> charattrname[ATTR_LEN]; /* attribute name */
> +   u8  bank;   /* bank number */
>  };
> +static struct mce_bank_dev mce_bank_devs[MAX_NR_BANKS];
> 
> > Still only 1.5K per cpu though.
> 
> Yah, I think that using static per-CPU memory should be better than
> GFP_ATOMIC.
> 

Okay, makes sense. I'll send a patch soon.

Thanks,
Yazen


RE: [PATCH v3 4/6] x86/MCE: Make number of MCA banks per_cpu

2019-05-21 Thread Ghannam, Yazen
> -Original Message-
> From: Borislav Petkov 
> Sent: Saturday, May 18, 2019 6:26 AM
> To: Ghannam, Yazen 
> Cc: linux-e...@vger.kernel.org; linux-kernel@vger.kernel.org; b...@suse.de; 
> tony.l...@intel.com; x...@kernel.org
> Subject: Re: [PATCH v3 4/6] x86/MCE: Make number of MCA banks per_cpu
> 
> 
> On Tue, Apr 30, 2019 at 08:32:20PM +, Ghannam, Yazen wrote:
> > From: Yazen Ghannam 
> >
> > The number of MCA banks is provided per logical CPU. Historically, this
> > number has been the same across all CPUs, but this is not an
> > architectural guarantee. Future AMD systems may have MCA bank counts
> > that vary between logical CPUs in a system.
> >
> > This issue was partially addressed in
> >
> > 006c077041dc ("x86/mce: Handle varying MCA bank counts")
> >
> > by allocating structures using the maximum number of MCA banks and by
> > saving the maximum MCA bank count in a system as the global count. This
> > means that some extra structures are allocated. Also, this means that
> > CPUs will spend more time in the #MC and other handlers checking extra
> > MCA banks.
> 
> ...
> 
> > @@ -1480,14 +1482,15 @@ EXPORT_SYMBOL_GPL(mce_notify_irq);
> >
> >  static int __mcheck_cpu_mce_banks_init(void)
> >  {
> > + u8 n_banks = this_cpu_read(mce_num_banks);
> >   struct mce_bank *mce_banks;
> >   int i;
> >
> > - mce_banks = kcalloc(MAX_NR_BANKS, sizeof(struct mce_bank), 
> > GFP_KERNEL);
> > + mce_banks = kcalloc(n_banks, sizeof(struct mce_bank), GFP_KERNEL);
> 
> Something changed in mm land or maybe we were lucky and got away with an
> atomic GFP_KERNEL allocation until now but:
> 
> [2.447838] smp: Bringing up secondary CPUs ...
> [2.456895] x86: Booting SMP configuration:
> [2.457822]  node  #0, CPUs:#1

The issue seems to be that the allocation is now happening on CPUs other than 
CPU0.

Patch 2 in this set has the same issue. I didn't see it until I turned on the 
"Lock Debugging" config options.

> [1.344284] BUG: sleeping function called from invalid context at 
> mm/slab.h:418

This message comes from ___might_sleep() which checks the system_state.

On CPU0, system_state=SYSTEM_BOOTING.

On every other CPU, system_state=SYSTEM_SCHEDULING, and that's the only 
system_state where the message is shown.

Changing GFP_KERNEL to GFP_ATOMIC seems to be a fix. Is this appropriate? Or do 
you think there's something else we could try?

Thanks,
Yazen



RE: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

2019-05-17 Thread Ghannam, Yazen
> -Original Message-
> From: linux-edac-ow...@vger.kernel.org  On 
> Behalf Of Borislav Petkov
> Sent: Friday, May 17, 2019 2:35 PM
> To: Luck, Tony 
> Cc: Ghannam, Yazen ; linux-e...@vger.kernel.org; 
> linux-kernel@vger.kernel.org; x...@kernel.org
> Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in 
> hardware
> 
> 
> On Fri, May 17, 2019 at 11:06:07AM -0700, Luck, Tony wrote:
> > and thus end up with that extra level on indent for the rest
> > of the function.
> 
> Ok:
> 
> ---
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 5bcecadcf4d9..25e501a853cd 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -1493,6 +1493,11 @@ static int __mcheck_cpu_mce_banks_init(void)
> for (i = 0; i < n_banks; i++) {
> struct mce_bank *b = _banks[i];
> 
> +   /*
> +* Init them all, __mcheck_cpu_apply_quirks() is going to 
> apply
> +* the required vendor quirks before
> +* __mcheck_cpu_init_clear_banks() does the final bank setup.
> +*/
> b->ctl = -1ULL;
> b->init = 1;
> }
> @@ -1562,6 +1567,7 @@ static void __mcheck_cpu_init_generic(void)
>  static void __mcheck_cpu_init_clear_banks(void)
>  {
> struct mce_bank *mce_banks = this_cpu_read(mce_banks_array);
> +   u64 msrval;
> int i;
> 
> for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
> @@ -1569,7 +1575,13 @@ static void __mcheck_cpu_init_clear_banks(void)
> 
> if (!b->init)
> continue;
> +
> +   /* Check if any bits are implemented in h/w */
> wrmsrl(msr_ops.ctl(i), b->ctl);
> +   rdmsrl(msr_ops.ctl(i), msrval);
> +
> +   b->init = !!msrval;
> +

Just a minor nit, but can we group the comment, RDMSR, and check together? The 
WRMSR is part of normal operation and isn't tied to the check.

Thanks,
Yazen


RE: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

2019-05-17 Thread Ghannam, Yazen
> -Original Message-
> From: linux-edac-ow...@vger.kernel.org  On 
> Behalf Of Borislav Petkov
> Sent: Friday, May 17, 2019 5:10 AM
> To: Luck, Tony 
> Cc: Ghannam, Yazen ; linux-e...@vger.kernel.org; 
> linux-kernel@vger.kernel.org; x...@kernel.org
> Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in 
> hardware
> 
> 
> On Thu, May 16, 2019 at 01:59:43PM -0700, Luck, Tony wrote:
> > I think the intent of the original patch was to find out
> > which bits are "implemented in hardware". I.e. throw all
> > 1's at the register and see if any of them stick.
> 
> And, in addition, check ->init before showing/setting a bank:
> 
> ---
> @@ -2095,6 +2098,9 @@ static ssize_t show_bank(struct device *s, struct 
> device_attribute *attr,
> 
> b = _cpu(mce_banks_array, s->id)[bank];
> 
> +   if (!b->init)
> +   return -ENODEV;
> +
> return sprintf(buf, "%llx\n", b->ctl);
>  }
> 
> @@ -2113,6 +2119,9 @@ static ssize_t set_bank(struct device *s, struct 
> device_attribute *attr,
> 
> b = _cpu(mce_banks_array, s->id)[bank];
> 
> +   if (!b->init)
> +   return -ENODEV;
> +
> b->ctl = new;
> mce_restart();
> ---
> 
> so that you get a feedback whether the setting has even succeeded or
> not. Right now we're doing "something" blindly and accepting any b->ctl
> from userspace. Yeah, it is root-only but still...
> 
> > I don't object to the idea behind the patch. But if you want
> > to do this you just should not modify b->ctl.
> >
> > So something like:
> >
> >
> > static void __mcheck_cpu_init_clear_banks(void)
> > {
> > struct mce_bank *mce_banks = this_cpu_read(mce_banks_array);
> >   u64 tmp;
> > int i;
> >
> > for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
> > struct mce_bank *b = _banks[i];
> >
> > if (b->init) {
> > wrmsrl(msr_ops.ctl(i), b->ctl);
> > wrmsrl(msr_ops.status(i), 0);
> >   rdmsrl(msr_ops.ctl(i), tmp);
> >
> >   /* Check if any bits implemented in h/w */
> >   b->init = !!tmp;
> > }
> 
> ... except that we unconditionally set ->init to 1 in
> __mcheck_cpu_mce_banks_init() and I think we should query it. Btw, that
> name __mcheck_cpu_mce_banks_init() is hideous too. I'll fix those up. In
> the meantime, how does the below look like? The change is to tickle out
> from the hw whether some CTL bits stick and then use that to determine
> b->init setting:
> 
> ---
> From: Yazen Ghannam 
> Date: Tue, 30 Apr 2019 20:32:21 +
> Subject: [PATCH] x86/MCE: Determine MCA banks' init state properly
> 
> The OS is expected to write all bits to MCA_CTL for each bank,
> thus enabling error reporting in all banks. However, some banks
> may be unused in which case the registers for such banks are
> Read-as-Zero/Writes-Ignored. Also, the OS may avoid setting some control
> bits because of quirks, etc.
> 
> A bank can be considered uninitialized if the MCA_CTL register returns
> zero. This is because either the OS did not write anything or because
> the hardware is enforcing RAZ/WI for the bank.
> 
> Set a bank's init value based on if the control bits are set or not in
> hardware. Return an error code in the sysfs interface for uninitialized
> banks.
> 
>  [ bp: Massage a bit. Discover bank init state at boot. ]
> 
> Signed-off-by: Yazen Ghannam 
> Signed-off-by: Borislav Petkov 
> Cc: "H. Peter Anvin" 
> Cc: Ingo Molnar 
> Cc: "linux-e...@vger.kernel.org" 
> Cc: Thomas Gleixner 
> Cc: Tony Luck 
> Cc: "x...@kernel.org" 
> Link: https://lkml.kernel.org/r/20190430203206.104163-7-yazen.ghan...@amd.com
> ---
>  arch/x86/kernel/cpu/mce/core.c | 23 ++-
>  1 file changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 5bcecadcf4d9..d84b0c707d0e 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -1492,9 +1492,16 @@ static int __mcheck_cpu_mce_banks_init(void)
> 
> for (i = 0; i < n_banks; i++) {
> struct mce_bank *b = _banks[i];
> +   u64 val;
> 
> b->ctl = -1ULL;
> -   b->init = 1;
> +
> +   /* Check if any bits are implemented in h/w */
> +   wrmsrl(msr_ops.ctl(i), b->ctl);
> +   rdmsrl(msr_ops.ctl(i), val);
> +   b->init = !!val;
> +
> +   wrmsrl(msr_ops.status(i), 0);
> }

I think there are a couple of issues here.
1) The bank is being initialized without accounting for any quirks.
2) The bank is being initialized without having set up any handler or other 
appropriate setup.

Thanks,
Yazen



RE: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

2019-05-16 Thread Ghannam, Yazen
> -Original Message-
> From: linux-edac-ow...@vger.kernel.org  On 
> Behalf Of Borislav Petkov
> Sent: Thursday, May 16, 2019 12:21 PM
> To: Ghannam, Yazen 
> Cc: Luck, Tony ; linux-e...@vger.kernel.org; 
> linux-kernel@vger.kernel.org; x...@kernel.org
> Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in 
> hardware
> 
> 
> On Thu, May 16, 2019 at 05:09:11PM +, Ghannam, Yazen wrote:
> > So that the sysfs files show the control values that are set in the
> > hardware. It seemed like this would be more helpful than showing all
> > 0xF's.
> 
> Yeah, but it has been like that since forever and it hasn't bugged
> anybody. Probably because anybody doesn't even look at those files. As
> Tony says:
> 
> "RAS is a lonely subsystem ... even EDAC gets more love."
> 
> :-)))
> 
> And adding yet another vendor check for this seemed just not worth it.
> 
> > Should I send out another version of this set?
> 
> I simply zapped 5/6. I still think your 6/6 makes sense though.
> 
> ---
> From: Yazen Ghannam 
> Date: Tue, 30 Apr 2019 20:32:21 +
> Subject: [PATCH] x86/MCE: Determine MCA banks' init state properly
> 
> The OS is expected to write all bits to MCA_CTL for each bank,
> thus enabling error reporting in all banks. However, some banks
> may be unused in which case the registers for such banks are
> Read-as-Zero/Writes-Ignored. Also, the OS may avoid setting some control
> bits because of quirks, etc.
> 
> A bank can be considered uninitialized if the MCA_CTL register returns
> zero. This is because either the OS did not write anything or because
> the hardware is enforcing RAZ/WI for the bank.
> 
> Set a bank's init value based on if the control bits are set or not in
> hardware. Return an error code in the sysfs interface for uninitialized
> banks.
> 
>  [ bp: Massage a bit. ]
> 
> Signed-off-by: Yazen Ghannam 
> Signed-off-by: Borislav Petkov 
> Cc: "H. Peter Anvin" 
> Cc: Ingo Molnar 
> Cc: "linux-e...@vger.kernel.org" 
> Cc: Thomas Gleixner 
> Cc: Tony Luck 
> Cc: "x...@kernel.org" 
> Link: https://lkml.kernel.org/r/20190430203206.104163-7-yazen.ghan...@amd.com
> ---
>  arch/x86/kernel/cpu/mce/core.c | 17 +
>  1 file changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 5bcecadcf4d9..c049689f3d73 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -1567,10 +1567,13 @@ static void __mcheck_cpu_init_clear_banks(void)
> for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
> struct mce_bank *b = _banks[i];
> 
> -   if (!b->init)
> -   continue;
> -   wrmsrl(msr_ops.ctl(i), b->ctl);
> -   wrmsrl(msr_ops.status(i), 0);
> +   if (b->init) {
> +   wrmsrl(msr_ops.ctl(i), b->ctl);
> +   wrmsrl(msr_ops.status(i), 0);
> +   }
> +
> +   /* Bank is initialized if bits are set in hardware. */
> +   b->init = !!b->ctl;

We don't actually know if there are bits set in hardware until we read it back. 
So I don't think this is adding anything new.

Thanks,
Yazen


RE: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

2019-05-16 Thread Ghannam, Yazen
> -Original Message-
> From: linux-edac-ow...@vger.kernel.org  On 
> Behalf Of Borislav Petkov
> Sent: Thursday, May 16, 2019 11:57 AM
> To: Ghannam, Yazen 
> Cc: Luck, Tony ; linux-e...@vger.kernel.org; 
> linux-kernel@vger.kernel.org; x...@kernel.org
> Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in 
> hardware
> 
> 
> On Thu, May 16, 2019 at 04:14:14PM +, Ghannam, Yazen wrote:
> > I can put a vendor check on the read. Is that sufficient?
> 
> Or we can drop this patch. Remind me again pls why do we need it?
> 

So that the sysfs files show the control values that are set in the hardware. 
It seemed like this would be more helpful than showing all 0xF's.

But I'm okay with dropping this patch. Patch 6 in this set depends on this, so 
it'll need to be dropped also.

Should I send out another version of this set?

Thanks,
Yazen


RE: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

2019-05-16 Thread Ghannam, Yazen
> -Original Message-
> From: Luck, Tony 
> Sent: Thursday, May 16, 2019 10:52 AM
> To: Ghannam, Yazen 
> Cc: linux-e...@vger.kernel.org; linux-kernel@vger.kernel.org; b...@suse.de; 
> x...@kernel.org
> Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in 
> hardware
> 
> 
> On Tue, Apr 30, 2019 at 08:32:20PM +, Ghannam, Yazen wrote:
> > diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> > index 986de830f26e..551366c155ef 100644
> > --- a/arch/x86/kernel/cpu/mce/core.c
> > +++ b/arch/x86/kernel/cpu/mce/core.c
> > @@ -1567,10 +1567,13 @@ static void __mcheck_cpu_init_clear_banks(void)
> >   for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
> >   struct mce_bank *b = _banks[i];
> >
> > - if (!b->init)
> > - continue;
> > - wrmsrl(msr_ops.ctl(i), b->ctl);
> > - wrmsrl(msr_ops.status(i), 0);
> > + if (b->init) {
> > + wrmsrl(msr_ops.ctl(i), b->ctl);
> > + wrmsrl(msr_ops.status(i), 0);
> > + }
> > +
> > + /* Save bits set in hardware. */
> > + rdmsrl(msr_ops.ctl(i), b->ctl);
> >   }
> >  }
> 
> This looks like it will be a problem for Intel CPUs. If
> we take a CPU offline, and then bring it back again, we
> ues "b->ctl" to reinitialize the register in mce_reenable_cpu().
> 
> But Intel SDM says at the end of section "15.3.2.1 IA32_MCi_CTL_MSRs"
> 
> "P6 family processors only allow the writing of all 1s or all
> 0s to the IA32_MCi_CTL MSR."
> 

I can put a vendor check on the read. Is that sufficient?

Thanks,
Yazen


[PATCH v3 1/6] x86/MCE: Make struct mce_banks[] static

2019-04-30 Thread Ghannam, Yazen
From: Yazen Ghannam 

The struct mce_banks[] array is only used in mce/core.c so move the
definition of struct mce_bank to mce/core.c and make the array static.

Also, change the "init" field to bool type.

Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190411201743.43195-2-yazen.ghan...@amd.com

v2->v3:
* No changes

v1->v2:
* No changes

 arch/x86/kernel/cpu/mce/core.c | 11 ++-
 arch/x86/kernel/cpu/mce/internal.h | 10 --
 2 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 5112a50e6486..ba5767dd5538 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -64,7 +64,16 @@ static DEFINE_MUTEX(mce_sysfs_mutex);
 
 DEFINE_PER_CPU(unsigned, mce_exception_count);
 
-struct mce_bank *mce_banks __read_mostly;
+#define ATTR_LEN   16
+/* One object for each MCE bank, shared by all CPUs */
+struct mce_bank {
+   u64 ctl;/* subevents to enable 
*/
+   boolinit;   /* initialise bank? */
+   struct device_attribute attr;   /* device attribute */
+   charattrname[ATTR_LEN]; /* attribute name */
+};
+
+static struct mce_bank *mce_banks __read_mostly;
 struct mce_vendor_flags mce_flags __read_mostly;
 
 struct mca_config mca_cfg __read_mostly = {
diff --git a/arch/x86/kernel/cpu/mce/internal.h 
b/arch/x86/kernel/cpu/mce/internal.h
index a34b55baa7aa..35b3e5c02c1c 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -22,17 +22,8 @@ enum severity_level {
 
 extern struct blocking_notifier_head x86_mce_decoder_chain;
 
-#define ATTR_LEN   16
 #define INITIAL_CHECK_INTERVAL 5 * 60 /* 5 minutes */
 
-/* One object for each MCE bank, shared by all CPUs */
-struct mce_bank {
-   u64 ctl;/* subevents to enable 
*/
-   unsigned char init; /* initialise bank? */
-   struct device_attribute attr;   /* device attribute */
-   charattrname[ATTR_LEN]; /* attribute name */
-};
-
 struct mce_evt_llist {
struct llist_node llnode;
struct mce mce;
@@ -47,7 +38,6 @@ struct llist_node *mce_gen_pool_prepare_records(void);
 extern int (*mce_severity)(struct mce *a, int tolerant, char **msg, bool 
is_excp);
 struct dentry *mce_get_debugfs_dir(void);
 
-extern struct mce_bank *mce_banks;
 extern mce_banks_t mce_banks_ce_disabled;
 
 #ifdef CONFIG_X86_MCE_INTEL
-- 
2.17.1



[PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

2019-04-30 Thread Ghannam, Yazen
From: Yazen Ghannam 

The OS is expected to write all bits in MCA_CTL. However, only
implemented bits get set in the hardware.

Read back MCA_CTL so that the value in the hardware is saved and
reported through sysfs.

Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190411201743.43195-6-yazen.ghan...@amd.com

v2->v3:
* No change.

v1->v2:
* No change.

 arch/x86/kernel/cpu/mce/core.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 986de830f26e..551366c155ef 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1567,10 +1567,13 @@ static void __mcheck_cpu_init_clear_banks(void)
for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
struct mce_bank *b = _banks[i];
 
-   if (!b->init)
-   continue;
-   wrmsrl(msr_ops.ctl(i), b->ctl);
-   wrmsrl(msr_ops.status(i), 0);
+   if (b->init) {
+   wrmsrl(msr_ops.ctl(i), b->ctl);
+   wrmsrl(msr_ops.status(i), 0);
+   }
+
+   /* Save bits set in hardware. */
+   rdmsrl(msr_ops.ctl(i), b->ctl);
}
 }
 
@@ -2325,8 +2328,10 @@ static void mce_reenable_cpu(void)
for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
struct mce_bank *b = _banks[i];
 
-   if (b->init)
+   if (b->init) {
wrmsrl(msr_ops.ctl(i), b->ctl);
+   rdmsrl(msr_ops.ctl(i), b->ctl);
+   }
}
 }
 
-- 
2.17.1



[PATCH v3 6/6] x86/MCE: Treat MCE bank as initialized if control bits set in hardware

2019-04-30 Thread Ghannam, Yazen
From: Yazen Ghannam 

The OS is expected to write all bits to MCA_CTL for each bank. However,
some banks may be unused in which case the registers for such banks are
Read-as-Zero/Writes-Ignored. Also, the OS may not write any control bits
because of quirks, etc.

A bank can be considered uninitialized if the MCA_CTL register returns
zero. This is because either the OS did not write anything or because
the hardware is enforcing RAZ/WI for the bank.

Set a bank's init value based on if the control bits are set or not in
hardware.

Return an error code in the sysfs interface for uninitialized banks.

Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190411201743.43195-7-yazen.ghan...@amd.com

v2->v3:
* No change.

v1->v2:
* New in v2.
* Based on discussion from v1 patch 2.

 arch/x86/kernel/cpu/mce/core.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 551366c155ef..e59947e10ee0 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1574,6 +1574,9 @@ static void __mcheck_cpu_init_clear_banks(void)
 
/* Save bits set in hardware. */
rdmsrl(msr_ops.ctl(i), b->ctl);
+
+   /* Bank is initialized if bits are set in hardware. */
+   b->init = !!b->ctl;
}
 }
 
@@ -2098,6 +2101,9 @@ static ssize_t show_bank(struct device *s, struct 
device_attribute *attr,
 
b = _cpu(mce_banks_percpu, s->id)[bank];
 
+   if (!b->init)
+   return -ENODEV;
+
return sprintf(buf, "%llx\n", b->ctl);
 }
 
@@ -2116,6 +2122,9 @@ static ssize_t set_bank(struct device *s, struct 
device_attribute *attr,
 
b = _cpu(mce_banks_percpu, s->id)[bank];
 
+   if (!b->init)
+   return -ENODEV;
+
b->ctl = new;
mce_restart();
 
-- 
2.17.1



[PATCH v3 0/6] Handle MCA banks in a per_cpu way

2019-04-30 Thread Ghannam, Yazen
From: Yazen Ghannam 

The focus of this patchset is define and use the MCA bank structures
and bank count per logical CPU.

With the exception of patch 4, this set applies to systems in production
today.

Patch 1:
Moves the declaration of struct mce_banks[] to the only file it's used.

Patch 2:
Splits struct mce_bank into a structure for fields common to MCA banks
on all CPUs and another structure that can be used per_cpu.

Patch 3:
Brings full circle the saga of the threshold block addresses on SMCA
systems. After taking a step back and reviewing the AMD documentation, I
think that this implimentation is the simplest and more robust way to
follow the spec.

Patch 4:
Saves and uses the MCA bank count as a per_cpu variable. This is to
support systems that have MCA bank counts that are different between
logical CPUs.

Patch 5:
Makes sure that sysfs reports the MCA_CTL value as set in hardware. This
is not something related to making things per_cpu but rather just
something I noticed while working on the other patches.

Patch 6:
Prevents sysfs access for MCA banks that are uninitialized.

Link:
https://lkml.kernel.org/r/20190411201743.43195-1-yazen.ghan...@amd.com

Thanks,
Yazen

Yazen Ghannam (6):
  x86/MCE: Make struct mce_banks[] static
  x86/MCE: Handle MCA controls in a per_cpu way
  x86/MCE/AMD: Don't cache block addresses on SMCA systems
  x86/MCE: Make number of MCA banks per_cpu
  x86/MCE: Save MCA control bits that get set in hardware
  x86/MCE: Treat MCE bank as initialized if control bits set in hardware

 arch/x86/kernel/cpu/mce/amd.c  |  90 ++--
 arch/x86/kernel/cpu/mce/core.c | 131 +
 arch/x86/kernel/cpu/mce/internal.h |  12 +--
 3 files changed, 142 insertions(+), 91 deletions(-)

-- 
2.17.1



[PATCH v3 4/6] x86/MCE: Make number of MCA banks per_cpu

2019-04-30 Thread Ghannam, Yazen
From: Yazen Ghannam 

The number of MCA banks is provided per logical CPU. Historically, this
number has been the same across all CPUs, but this is not an
architectural guarantee. Future AMD systems may have MCA bank counts
that vary between logical CPUs in a system.

This issue was partially addressed in

006c077041dc ("x86/mce: Handle varying MCA bank counts")

by allocating structures using the maximum number of MCA banks and by
saving the maximum MCA bank count in a system as the global count. This
means that some extra structures are allocated. Also, this means that
CPUs will spend more time in the #MC and other handlers checking extra
MCA banks.

Define the number of MCA banks as a per_cpu variable. Replace all uses
of mca_cfg.banks with this.

Also, use the per_cpu bank count when allocating per_cpu structures.

Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190411201743.43195-5-yazen.ghan...@amd.com

v2->v3:
* Drop pr_debug() message.
* Change commit reference format.

v1->v2:
* Drop export of new variable and leave injector code as-is.
* Add "mce_" prefix to new "num_banks" variable.

 arch/x86/kernel/cpu/mce/amd.c  | 17 ++-
 arch/x86/kernel/cpu/mce/core.c | 47 +-
 arch/x86/kernel/cpu/mce/internal.h |  2 +-
 3 files changed, 36 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index d4d6e4b7f9dc..9f729d50676c 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -495,7 +495,7 @@ static u32 get_block_address(u32 current_addr, u32 low, u32 
high,
 {
u32 addr = 0, offset = 0;
 
-   if ((bank >= mca_cfg.banks) || (block >= NR_BLOCKS))
+   if ((bank >= per_cpu(mce_num_banks, cpu)) || (block >= NR_BLOCKS))
return addr;
 
if (mce_flags.smca)
@@ -631,7 +631,8 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
unsigned int bank, block, cpu = smp_processor_id();
int offset = -1;
 
-   for (bank = 0; bank < mca_cfg.banks; ++bank) {
+
+   for (bank = 0; bank < this_cpu_read(mce_num_banks); ++bank) {
if (mce_flags.smca)
smca_configure(bank, cpu);
 
@@ -976,7 +977,7 @@ static void amd_deferred_error_interrupt(void)
 {
unsigned int bank;
 
-   for (bank = 0; bank < mca_cfg.banks; ++bank)
+   for (bank = 0; bank < this_cpu_read(mce_num_banks); ++bank)
log_error_deferred(bank);
 }
 
@@ -1017,7 +1018,7 @@ static void amd_threshold_interrupt(void)
struct threshold_block *first_block = NULL, *block = NULL, *tmp = NULL;
unsigned int bank, cpu = smp_processor_id();
 
-   for (bank = 0; bank < mca_cfg.banks; ++bank) {
+   for (bank = 0; bank < this_cpu_read(mce_num_banks); ++bank) {
if (!(per_cpu(bank_map, cpu) & (1 << bank)))
continue;
 
@@ -1204,7 +1205,7 @@ static int allocate_threshold_blocks(unsigned int cpu, 
unsigned int bank,
u32 low, high;
int err;
 
-   if ((bank >= mca_cfg.banks) || (block >= NR_BLOCKS))
+   if ((bank >= per_cpu(mce_num_banks, cpu)) || (block >= NR_BLOCKS))
return 0;
 
if (rdmsr_safe_on_cpu(cpu, address, , ))
@@ -1438,7 +1439,7 @@ int mce_threshold_remove_device(unsigned int cpu)
 {
unsigned int bank;
 
-   for (bank = 0; bank < mca_cfg.banks; ++bank) {
+   for (bank = 0; bank < per_cpu(mce_num_banks, cpu); ++bank) {
if (!(per_cpu(bank_map, cpu) & (1 << bank)))
continue;
threshold_remove_bank(cpu, bank);
@@ -1459,14 +1460,14 @@ int mce_threshold_create_device(unsigned int cpu)
if (bp)
return 0;
 
-   bp = kcalloc(mca_cfg.banks, sizeof(struct threshold_bank *),
+   bp = kcalloc(per_cpu(mce_num_banks, cpu), sizeof(struct threshold_bank 
*),
 GFP_KERNEL);
if (!bp)
return -ENOMEM;
 
per_cpu(threshold_banks, cpu) = bp;
 
-   for (bank = 0; bank < mca_cfg.banks; ++bank) {
+   for (bank = 0; bank < per_cpu(mce_num_banks, cpu); ++bank) {
if (!(per_cpu(bank_map, cpu) & (1 << bank)))
continue;
err = threshold_create_bank(cpu, bank);
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 66347bdc8b08..986de830f26e 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -64,6 +64,8 @@ static DEFINE_MUTEX(mce_sysfs_mutex);
 
 DEFINE_PER_CPU(unsigned, mce_exception_count);
 
+DEFINE_PER_CPU_READ_MOSTLY(u8, mce_num_banks);
+
 struct mce_bank {
u64 ctl;/* subevents to enable 
*/
boolinit;   /* initialise bank? */
@@ -700,7 +702,7 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t 
*b)
if (flags & MCP_TIMESTAMP)
m.tsc = 

[PATCH v3 2/6] x86/MCE: Handle MCA controls in a per_cpu way

2019-04-30 Thread Ghannam, Yazen
From: Yazen Ghannam 

Current AMD systems have unique MCA banks per logical CPU even though
the type of the banks may all align to the same bank number. Each CPU
will have control of a set of MCA banks in the hardware and these are
not shared with other CPUs.

For example, bank 0 may be the Load-Store Unit on every logical CPU, but
each bank 0 is a unique structure in the hardware. In other words, there
isn't a *single* Load-Store Unit at MCA bank 0 that all logical CPUs
share.

This idea extends even to non-core MCA banks. For example, CPU0 and CPU4
may see a Unified Memory Controller at bank 15, but each CPU is actually
seeing a unique hardware structure that is not shared with other CPUs.

Because the MCA banks are all unique hardware structures, it would be
good to control them in a more granular way. For example, if there is a
known issue with the Floating Point Unit on CPU5 and a user wishes to
disable an error type on the Floating Point Unit, then it would be good
to do this only for CPU5 rather than all CPUs.

Also, future AMD systems may have heterogeneous MCA banks. Meaning the
bank numbers may not necessarily represent the same types between CPUs.
For example, bank 20 visible to CPU0 may be a Unified Memory Controller
and bank 20 visible to CPU4 may be a Coherent Slave. So granular control
will be even more necessary should the user wish to control specific MCA
banks.

Split the device attributes from struct mce_bank leaving only the MCA
bank control fields.

Make struct mce_banks[] per_cpu in order to have more granular control
over individual MCA banks in the hardware.

Allocate the device attributes statically based on the maximum number of
MCA banks supported. The sysfs interface will use as many as needed per
CPU. Currently, this is set to mca_cfg.banks, but will be changed to a
per_cpu bank count in a future patch.

Allocate the MCA control bits dynamically. Use the maximum number of MCA
banks supported for now. This will be changed to a per_cpu bank count in
a future patch.

Redo the sysfs store/show functions to handle the per_cpu mce_banks[].

Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190411201743.43195-3-yazen.ghan...@amd.com

v2->v3:
* Keep old member alignment in struct mce_bank.
* Change "cpu" to "CPU" in modified comment.
* Use a local array pointer when doing multiple per_cpu accesses.

v1->v2:
* Change "struct mce_bank*" to "struct mce_bank *" in definition.

 arch/x86/kernel/cpu/mce/core.c | 59 ++
 1 file changed, 45 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index ba5767dd5538..66347bdc8b08 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -64,16 +64,21 @@ static DEFINE_MUTEX(mce_sysfs_mutex);
 
 DEFINE_PER_CPU(unsigned, mce_exception_count);
 
-#define ATTR_LEN   16
-/* One object for each MCE bank, shared by all CPUs */
 struct mce_bank {
u64 ctl;/* subevents to enable 
*/
boolinit;   /* initialise bank? */
+};
+static DEFINE_PER_CPU_READ_MOSTLY(struct mce_bank *, mce_banks_percpu);
+
+#define ATTR_LEN   16
+/* One object for each MCE bank, shared by all CPUs */
+struct mce_bank_dev {
struct device_attribute attr;   /* device attribute */
charattrname[ATTR_LEN]; /* attribute name */
+   u8  bank;   /* bank number */
 };
+static struct mce_bank_dev mce_bank_devs[MAX_NR_BANKS];
 
-static struct mce_bank *mce_banks __read_mostly;
 struct mce_vendor_flags mce_flags __read_mostly;
 
 struct mca_config mca_cfg __read_mostly = {
@@ -683,6 +688,7 @@ DEFINE_PER_CPU(unsigned, mce_poll_count);
  */
 bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 {
+   struct mce_bank *mce_banks = this_cpu_read(mce_banks_percpu);
bool error_seen = false;
struct mce m;
int i;
@@ -1130,6 +1136,7 @@ static void __mc_scan_banks(struct mce *m, struct mce 
*final,
unsigned long *toclear, unsigned long *valid_banks,
int no_way_out, int *worst)
 {
+   struct mce_bank *mce_banks = this_cpu_read(mce_banks_percpu);
struct mca_config *cfg = _cfg;
int severity, i;
 
@@ -1473,6 +1480,7 @@ EXPORT_SYMBOL_GPL(mce_notify_irq);
 
 static int __mcheck_cpu_mce_banks_init(void)
 {
+   struct mce_bank *mce_banks;
int i;
 
mce_banks = kcalloc(MAX_NR_BANKS, sizeof(struct mce_bank), GFP_KERNEL);
@@ -1485,6 +1493,8 @@ static int __mcheck_cpu_mce_banks_init(void)
b->ctl = -1ULL;
b->init = 1;
}
+
+   per_cpu(mce_banks_percpu, smp_processor_id()) = mce_banks;
return 0;
 }
 
@@ -1504,7 +1514,7 @@ static int __mcheck_cpu_cap_init(void)
 
mca_cfg.banks = max(mca_cfg.banks, b);

[PATCH v3 3/6] x86/MCE/AMD: Don't cache block addresses on SMCA systems

2019-04-30 Thread Ghannam, Yazen
From: Yazen Ghannam 

On legacy systems, the addresses of the MCA_MISC* registers need to be
recursively discovered based on a Block Pointer field in the registers.

On Scalable MCA systems, the register space is fixed, and particular
addresses can be derived by regular offsets for bank and register type.
This fixed address space includes the MCA_MISC* registers.

MCA_MISC0 is always available for each MCA bank. MCA_MISC1 through
MCA_MISC4 are considered available if MCA_MISC0[BlkPtr]=1.

Cache the value of MCA_MISC0[BlkPtr] for each bank and per CPU. This
needs to be done only during init. The values should be saved per CPU
to accommodate heterogeneous SMCA systems.

Redo smca_get_block_address() to directly return the block addresses.
Use the cached Block Pointer value to decide if the MCA_MISC1-4
addresses should be returned.

Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190411201743.43195-4-yazen.ghan...@amd.com

v2->v3:
* Change name of new map variable to "smca_misc_banks_map".
* Use "BIT()" where appropriate.

v1->v2:
* No change.

 arch/x86/kernel/cpu/mce/amd.c | 73 ++-
 1 file changed, 37 insertions(+), 36 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index d904aafe6409..d4d6e4b7f9dc 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -101,11 +101,6 @@ static struct smca_bank_name smca_names[] = {
[SMCA_PCIE] = { "pcie", "PCI Express Unit" },
 };
 
-static u32 smca_bank_addrs[MAX_NR_BANKS][NR_BLOCKS] __ro_after_init =
-{
-   [0 ... MAX_NR_BANKS - 1] = { [0 ... NR_BLOCKS - 1] = -1 }
-};
-
 static const char *smca_get_name(enum smca_bank_types t)
 {
if (t >= N_SMCA_BANK_TYPES)
@@ -199,6 +194,9 @@ static char buf_mcatype[MAX_MCATYPE_NAME_LEN];
 static DEFINE_PER_CPU(struct threshold_bank **, threshold_banks);
 static DEFINE_PER_CPU(unsigned int, bank_map); /* see which banks are on */
 
+/* Map of banks that have more than MCA_MISC0 available. */
+static DEFINE_PER_CPU(u32, smca_misc_banks_map);
+
 static void amd_threshold_interrupt(void);
 static void amd_deferred_error_interrupt(void);
 
@@ -208,6 +206,28 @@ static void default_deferred_error_interrupt(void)
 }
 void (*deferred_error_int_vector)(void) = default_deferred_error_interrupt;
 
+static void smca_set_misc_banks_map(unsigned int bank, unsigned int cpu)
+{
+   u32 low, high;
+
+   /*
+* For SMCA enabled processors, BLKPTR field of the first MISC register
+* (MCx_MISC0) indicates presence of additional MISC regs set (MISC1-4).
+*/
+   if (rdmsr_safe(MSR_AMD64_SMCA_MCx_CONFIG(bank), , ))
+   return;
+
+   if (!(low & MCI_CONFIG_MCAX))
+   return;
+
+   if (rdmsr_safe(MSR_AMD64_SMCA_MCx_MISC(bank), , ))
+   return;
+
+   if (low & MASK_BLKPTR_LO)
+   per_cpu(smca_misc_banks_map, cpu) |= BIT(bank);
+
+}
+
 static void smca_configure(unsigned int bank, unsigned int cpu)
 {
unsigned int i, hwid_mcatype;
@@ -245,6 +265,8 @@ static void smca_configure(unsigned int bank, unsigned int 
cpu)
wrmsr(smca_config, low, high);
}
 
+   smca_set_misc_banks_map(bank, cpu);
+
/* Return early if this bank was already initialized. */
if (smca_banks[bank].hwid)
return;
@@ -455,42 +477,21 @@ static void deferred_error_interrupt_enable(struct 
cpuinfo_x86 *c)
wrmsr(MSR_CU_DEF_ERR, low, high);
 }
 
-static u32 smca_get_block_address(unsigned int bank, unsigned int block)
+static u32 smca_get_block_address(unsigned int bank, unsigned int block,
+ unsigned int cpu)
 {
-   u32 low, high;
-   u32 addr = 0;
-
-   if (smca_get_bank_type(bank) == SMCA_RESERVED)
-   return addr;
-
if (!block)
return MSR_AMD64_SMCA_MCx_MISC(bank);
 
-   /* Check our cache first: */
-   if (smca_bank_addrs[bank][block] != -1)
-   return smca_bank_addrs[bank][block];
-
-   /*
-* For SMCA enabled processors, BLKPTR field of the first MISC register
-* (MCx_MISC0) indicates presence of additional MISC regs set (MISC1-4).
-*/
-   if (rdmsr_safe(MSR_AMD64_SMCA_MCx_CONFIG(bank), , ))
-   goto out;
-
-   if (!(low & MCI_CONFIG_MCAX))
-   goto out;
-
-   if (!rdmsr_safe(MSR_AMD64_SMCA_MCx_MISC(bank), , ) &&
-   (low & MASK_BLKPTR_LO))
-   addr = MSR_AMD64_SMCA_MCx_MISCy(bank, block - 1);
+   if (!(per_cpu(smca_misc_banks_map, cpu) & BIT(bank)))
+   return 0;
 
-out:
-   smca_bank_addrs[bank][block] = addr;
-   return addr;
+   return MSR_AMD64_SMCA_MCx_MISCy(bank, block - 1);
 }
 
 static u32 get_block_address(u32 current_addr, u32 low, u32 high,
-unsigned int bank, unsigned int block)
+unsigned int bank, 

RE: [PATCH] tools/power turbostat: Make interval calculation per thread to reduce jitter

2019-04-23 Thread Ghannam, Yazen
> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org  
> On Behalf Of Ghannam, Yazen
> Sent: Monday, March 25, 2019 12:33 PM
> To: linux...@vger.kernel.org
> Cc: Ghannam, Yazen ; linux-kernel@vger.kernel.org; 
> l...@kernel.org
> Subject: [PATCH] tools/power turbostat: Make interval calculation per thread 
> to reduce jitter
> 
> From: Yazen Ghannam 
> 
> Turbostat currently normalizes TSC and other values by dividing by an
> interval. This interval is the delta between the start of one global
> (all counters on all CPUs) sampling and the start of another. However,
> this introduces a lot of jitter into the data.
> 
> In order to reduce jitter, the interval calculation should be based on
> timestamps taken per thread and close to the start of the thread's
> sampling.
> 
> Define a per thread time value to hold the delta between samples taken
> on the thread.
> 
> Use the timestamp taken at the beginning of sampling to calculate the
> delta.
> 
> Move the thread's beginning timestamp to after the CPU migration to
> avoid jitter due to the migration.
> 
> Use the global time delta for the average time delta.
> 
> Signed-off-by: Yazen Ghannam 
> ---

Hi Len,

Any comments on this patch?

Thanks,
Yazen


RE: [PATCH v2 0/7] CPPC optional registers AMD support

2019-04-18 Thread Ghannam, Yazen
> -Original Message-
> From: Rafael J. Wysocki 
> Sent: Wednesday, April 17, 2019 5:11 PM
> To: Ghannam, Yazen 
> Cc: Rafael J. Wysocki ; Natarajan, Janakarajan 
> ; linux-a...@vger.kernel.org; linux-
> ker...@vger.kernel.org; linux...@vger.kernel.org; de...@acpica.org; Rafael J 
> . Wysocki ; Len Brown
> ; Viresh Kumar ; Robert Moore 
> ; Erik Schmauss
> 
> Subject: Re: [PATCH v2 0/7] CPPC optional registers AMD support
> 
> On Wed, Apr 17, 2019 at 7:28 PM Ghannam, Yazen  wrote:
> >
> > > -Original Message-
> > > From: Rafael J. Wysocki 
> > > Sent: Tuesday, April 16, 2019 4:34 AM
> > > To: Natarajan, Janakarajan 
> > > Cc: Natarajan, Janakarajan ; 
> > > linux-a...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-
> > > p...@vger.kernel.org; de...@acpica.org; Rafael J . Wysocki 
> > > ; Len Brown ; Viresh Kumar
> > > ; Robert Moore ; Erik 
> > > Schmauss ; Ghannam, Yazen
> > > 
> > > Subject: Re: [PATCH v2 0/7] CPPC optional registers AMD support
> > >
> > > On Tue, Apr 16, 2019 at 12:35 AM Janakarajan Natarajan  
> > > wrote:
> > > >
> > > > On 4/4/19 4:25 PM, Natarajan, Janakarajan wrote:
> > > > > CPPC (Collaborative Processor Performance Control) offers optional
> > > > > registers which can be used to tune the system based on energy and/or
> > > > > performance requirements.
> > > > >
> > > > > Newer AMD processors add support for a subset of these optional CPPC
> > > > > registers, based on ACPI v6.1.
> > > > >
> > > > > The following are the supported CPPC registers for which sysfs entries
> > > > > are created:
> > > > > * enable(NEW)
> > > > > * max_perf  (NEW)
> > > > > * min_perf  (NEW)
> > > > > * energy_perf
> > > > > * lowest_perf
> > > > > * nominal_perf
> > > > > * desired_perf  (NEW)
> > > > > * feedback_ctrs
> > > > > * auto_sel_enable   (NEW)
> > > > > * lowest_nonlinear_perf
> > > > >
> > > > > The CPPC driver is updated to enable the OSPM and the userspace to
> > > > > access
> > > > > the newly supported registers.
> > > > >
> > > > > The purpose of exposing the registers via the sysfs entries is to 
> > > > > allow
> > > > > the userspace to:
> > > > > * Tweak the values to fit its workload.
> > > > > * Apply a profile from AMD's optimization guides.
> > > > >
> > > > > Profiles will be documented in the performance/optimization guides.
> > > > >
> > > > > Note:
> > > > > * AMD systems will not have a policy applied in the kernel at this 
> > > > > time.
> > > > > * By default, acpi_cpufreq will still be used.
> > > > >
> > > > > TODO:
> > > > > * Create a linux userspace tool that will help users generate a CPPC
> > > > > * profile
> > > > >for their target workload.
> > > > > * Create or update a driver to apply a general CPPC policy in the
> > > > > * kernel.
> > > > >
> > > > > v1->v2:
> > > > > * Add macro to ensure BUFFER only registers have BUFFER type.
> > > > > * Add support macro to make the right check based on register type.
> > > > > * Remove support checks for registers which are mandatory.
> > > >
> > > >
> > > > Are there any concerns regarding this patchset?
> > >
> > > Yes, there are.
> > >
> > > Unfortunately, it is generally problematic.
> > >
> > > First off, the behavior of existing sysfs files cannot be changed at
> > > will, as there may be users of them out there already depending on the
> > > current behavior.
> > >
> >
> > The intent is to add new sysfs files without changing the existing files. 
> > Is that okay?
> >
> > Or is the addition of new files also not acceptable?
> >
> > > Second, at least some CPPC control registers are used by cpufreq
> > > drivers (cppc_cpufreq and intel_pstate), so modifying them behind the
> > > drivers' backs is not a good idea in general.  For this reason, adding
> > > new sysfs attributes to allow user space to do that is quite
> > >

RE: [PATCH v2 0/7] CPPC optional registers AMD support

2019-04-17 Thread Ghannam, Yazen
> -Original Message-
> From: Rafael J. Wysocki 
> Sent: Tuesday, April 16, 2019 4:34 AM
> To: Natarajan, Janakarajan 
> Cc: Natarajan, Janakarajan ; 
> linux-a...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-
> p...@vger.kernel.org; de...@acpica.org; Rafael J . Wysocki 
> ; Len Brown ; Viresh Kumar
> ; Robert Moore ; Erik 
> Schmauss ; Ghannam, Yazen
> 
> Subject: Re: [PATCH v2 0/7] CPPC optional registers AMD support
> 
> On Tue, Apr 16, 2019 at 12:35 AM Janakarajan Natarajan  
> wrote:
> >
> > On 4/4/19 4:25 PM, Natarajan, Janakarajan wrote:
> > > CPPC (Collaborative Processor Performance Control) offers optional
> > > registers which can be used to tune the system based on energy and/or
> > > performance requirements.
> > >
> > > Newer AMD processors add support for a subset of these optional CPPC
> > > registers, based on ACPI v6.1.
> > >
> > > The following are the supported CPPC registers for which sysfs entries
> > > are created:
> > > * enable(NEW)
> > > * max_perf  (NEW)
> > > * min_perf  (NEW)
> > > * energy_perf
> > > * lowest_perf
> > > * nominal_perf
> > > * desired_perf  (NEW)
> > > * feedback_ctrs
> > > * auto_sel_enable   (NEW)
> > > * lowest_nonlinear_perf
> > >
> > > The CPPC driver is updated to enable the OSPM and the userspace to
> > > access
> > > the newly supported registers.
> > >
> > > The purpose of exposing the registers via the sysfs entries is to allow
> > > the userspace to:
> > > * Tweak the values to fit its workload.
> > > * Apply a profile from AMD's optimization guides.
> > >
> > > Profiles will be documented in the performance/optimization guides.
> > >
> > > Note:
> > > * AMD systems will not have a policy applied in the kernel at this time.
> > > * By default, acpi_cpufreq will still be used.
> > >
> > > TODO:
> > > * Create a linux userspace tool that will help users generate a CPPC
> > > * profile
> > >for their target workload.
> > > * Create or update a driver to apply a general CPPC policy in the
> > > * kernel.
> > >
> > > v1->v2:
> > > * Add macro to ensure BUFFER only registers have BUFFER type.
> > > * Add support macro to make the right check based on register type.
> > > * Remove support checks for registers which are mandatory.
> >
> >
> > Are there any concerns regarding this patchset?
> 
> Yes, there are.
> 
> Unfortunately, it is generally problematic.
> 
> First off, the behavior of existing sysfs files cannot be changed at
> will, as there may be users of them out there already depending on the
> current behavior.
> 

The intent is to add new sysfs files without changing the existing files. Is 
that okay?

Or is the addition of new files also not acceptable?

> Second, at least some CPPC control registers are used by cpufreq
> drivers (cppc_cpufreq and intel_pstate), so modifying them behind the
> drivers' backs is not a good idea in general.  For this reason, adding
> new sysfs attributes to allow user space to do that is quite
> questionable.
> 

Yes, good point.

What if a check is added so that writes only succeed if the CPUFREQ governor is 
set to userspace?

Thanks,
Yazen


[PATCH v2 0/6] Handle MCA banks in a per_cpu way

2019-04-11 Thread Ghannam, Yazen
From: Yazen Ghannam 

The focus of this patchset is define and use the MCA bank structures
and bank count per logical CPU.

With the exception of patch 4, this set applies to systems in production
today.

Patch 1:
Moves the declaration of struct mce_banks[] to the only file it's used.

Patch 2:
Splits struct mce_bank into a structure for fields common to MCA banks
on all CPUs and another structure that can be used per_cpu.

Patch 3:
Brings full circle the saga of the threshold block addresses on SMCA
systems. After taking a step back and reviewing the AMD documentation, I
think that this implimentation is the simplest and more robust way to
follow the spec.

Patch 4:
Saves and uses the MCA bank count as a per_cpu variable. This is to
support systems that have MCA bank counts that are different between
logical CPUs.

Patch 5:
Makes sure that sysfs reports the MCA_CTL value as set in hardware. This
is not something related to making things per_cpu but rather just
something I noticed while working on the other patches.

Patch 6:
Prevents sysfs access for MCA banks that are uninitialized.

Link:
https://lkml.kernel.org/r/20190408141205.12376-1-yazen.ghan...@amd.com

Thanks,
Yazen


Yazen Ghannam (6):
  x86/MCE: Make struct mce_banks[] static
  x86/MCE: Handle MCA controls in a per_cpu way
  x86/MCE/AMD: Don't cache block addresses on SMCA systems
  x86/MCE: Make number of MCA banks per_cpu
  x86/MCE: Save MCA control bits that get set in hardware
  x86/MCE: Treat MCE bank as initialized if control bits set in hardware

 arch/x86/kernel/cpu/mce/amd.c  |  87 +
 arch/x86/kernel/cpu/mce/core.c | 146 -
 arch/x86/kernel/cpu/mce/internal.h |  12 +--
 3 files changed, 144 insertions(+), 101 deletions(-)

-- 
2.17.1



[PATCH v2 3/6] x86/MCE/AMD: Don't cache block addresses on SMCA systems

2019-04-11 Thread Ghannam, Yazen
From: Yazen Ghannam 

On legacy systems, the addresses of the MCA_MISC* registers need to be
recursively discovered based on a Block Pointer field in the registers.

On Scalable MCA systems, the register space is fixed, and particular
addresses can be derived by regular offsets for bank and register type.
This fixed address space includes the MCA_MISC* registers.

MCA_MISC0 is always available for each MCA bank. MCA_MISC1 through
MCA_MISC4 are considered available if MCA_MISC0[BlkPtr]=1.

Cache the value of MCA_MISC0[BlkPtr] for each bank and per CPU. This
needs to be done only during init. The values should be saved per CPU
to accommodate heterogeneous SMCA systems.

Redo smca_get_block_address() to directly return the block addresses.
Use the cached Block Pointer value to decide if the MCA_MISC1-4
addresses should be returned.

Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190408141205.12376-4-yazen.ghan...@amd.com

v1->v2:
* No change.

 arch/x86/kernel/cpu/mce/amd.c | 71 +--
 1 file changed, 35 insertions(+), 36 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index e64de5149e50..f0644b59848d 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -101,11 +101,6 @@ static struct smca_bank_name smca_names[] = {
[SMCA_PCIE] = { "pcie", "PCI Express Unit" },
 };
 
-static u32 smca_bank_addrs[MAX_NR_BANKS][NR_BLOCKS] __ro_after_init =
-{
-   [0 ... MAX_NR_BANKS - 1] = { [0 ... NR_BLOCKS - 1] = -1 }
-};
-
 static const char *smca_get_name(enum smca_bank_types t)
 {
if (t >= N_SMCA_BANK_TYPES)
@@ -198,6 +193,7 @@ static char buf_mcatype[MAX_MCATYPE_NAME_LEN];
 
 static DEFINE_PER_CPU(struct threshold_bank **, threshold_banks);
 static DEFINE_PER_CPU(unsigned int, bank_map); /* see which banks are on */
+static DEFINE_PER_CPU(u32, smca_blkptr_map); /* see which banks use BlkPtr */
 
 static void amd_threshold_interrupt(void);
 static void amd_deferred_error_interrupt(void);
@@ -208,6 +204,28 @@ static void default_deferred_error_interrupt(void)
 }
 void (*deferred_error_int_vector)(void) = default_deferred_error_interrupt;
 
+static void smca_set_blkptr_map(unsigned int bank, unsigned int cpu)
+{
+   u32 low, high;
+
+   /*
+* For SMCA enabled processors, BLKPTR field of the first MISC register
+* (MCx_MISC0) indicates presence of additional MISC regs set (MISC1-4).
+*/
+   if (rdmsr_safe(MSR_AMD64_SMCA_MCx_CONFIG(bank), , ))
+   return;
+
+   if (!(low & MCI_CONFIG_MCAX))
+   return;
+
+   if (rdmsr_safe(MSR_AMD64_SMCA_MCx_MISC(bank), , ))
+   return;
+
+   if (low & MASK_BLKPTR_LO)
+   per_cpu(smca_blkptr_map, cpu) |= 1 << bank;
+
+}
+
 static void smca_configure(unsigned int bank, unsigned int cpu)
 {
unsigned int i, hwid_mcatype;
@@ -245,6 +263,8 @@ static void smca_configure(unsigned int bank, unsigned int 
cpu)
wrmsr(smca_config, low, high);
}
 
+   smca_set_blkptr_map(bank, cpu);
+
/* Return early if this bank was already initialized. */
if (smca_banks[bank].hwid)
return;
@@ -455,42 +475,21 @@ static void deferred_error_interrupt_enable(struct 
cpuinfo_x86 *c)
wrmsr(MSR_CU_DEF_ERR, low, high);
 }
 
-static u32 smca_get_block_address(unsigned int bank, unsigned int block)
+static u32 smca_get_block_address(unsigned int bank, unsigned int block,
+ unsigned int cpu)
 {
-   u32 low, high;
-   u32 addr = 0;
-
-   if (smca_get_bank_type(bank) == SMCA_RESERVED)
-   return addr;
-
if (!block)
return MSR_AMD64_SMCA_MCx_MISC(bank);
 
-   /* Check our cache first: */
-   if (smca_bank_addrs[bank][block] != -1)
-   return smca_bank_addrs[bank][block];
-
-   /*
-* For SMCA enabled processors, BLKPTR field of the first MISC register
-* (MCx_MISC0) indicates presence of additional MISC regs set (MISC1-4).
-*/
-   if (rdmsr_safe(MSR_AMD64_SMCA_MCx_CONFIG(bank), , ))
-   goto out;
-
-   if (!(low & MCI_CONFIG_MCAX))
-   goto out;
-
-   if (!rdmsr_safe(MSR_AMD64_SMCA_MCx_MISC(bank), , ) &&
-   (low & MASK_BLKPTR_LO))
-   addr = MSR_AMD64_SMCA_MCx_MISCy(bank, block - 1);
+   if (!(per_cpu(smca_blkptr_map, cpu) & (1 << bank)))
+   return 0;
 
-out:
-   smca_bank_addrs[bank][block] = addr;
-   return addr;
+   return MSR_AMD64_SMCA_MCx_MISCy(bank, block - 1);
 }
 
 static u32 get_block_address(u32 current_addr, u32 low, u32 high,
-unsigned int bank, unsigned int block)
+unsigned int bank, unsigned int block,
+unsigned int cpu)
 {
u32 addr = 0, offset = 0;
 
@@ -498,7 +497,7 @@ static u32 

[PATCH v2 1/6] x86/MCE: Make struct mce_banks[] static

2019-04-11 Thread Ghannam, Yazen
From: Yazen Ghannam 

The struct mce_banks[] array is only used in mce/core.c so move the
definition of struct mce_bank to mce/core.c and make the array static.

Also, change the "init" field to bool type.

Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190408141205.12376-2-yazen.ghan...@amd.com

v1->v2:
* No changes

 arch/x86/kernel/cpu/mce/core.c | 11 ++-
 arch/x86/kernel/cpu/mce/internal.h | 10 --
 2 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 58925e7ccb27..8d0d1e8425db 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -64,7 +64,16 @@ static DEFINE_MUTEX(mce_sysfs_mutex);
 
 DEFINE_PER_CPU(unsigned, mce_exception_count);
 
-struct mce_bank *mce_banks __read_mostly;
+#define ATTR_LEN   16
+/* One object for each MCE bank, shared by all CPUs */
+struct mce_bank {
+   u64 ctl;/* subevents to enable 
*/
+   boolinit;   /* initialise bank? */
+   struct device_attribute attr;   /* device attribute */
+   charattrname[ATTR_LEN]; /* attribute name */
+};
+
+static struct mce_bank *mce_banks __read_mostly;
 struct mce_vendor_flags mce_flags __read_mostly;
 
 struct mca_config mca_cfg __read_mostly = {
diff --git a/arch/x86/kernel/cpu/mce/internal.h 
b/arch/x86/kernel/cpu/mce/internal.h
index af5eab1e65e2..032d52c66616 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -22,17 +22,8 @@ enum severity_level {
 
 extern struct blocking_notifier_head x86_mce_decoder_chain;
 
-#define ATTR_LEN   16
 #define INITIAL_CHECK_INTERVAL 5 * 60 /* 5 minutes */
 
-/* One object for each MCE bank, shared by all CPUs */
-struct mce_bank {
-   u64 ctl;/* subevents to enable 
*/
-   unsigned char init; /* initialise bank? */
-   struct device_attribute attr;   /* device attribute */
-   charattrname[ATTR_LEN]; /* attribute name */
-};
-
 struct mce_evt_llist {
struct llist_node llnode;
struct mce mce;
@@ -47,7 +38,6 @@ struct llist_node *mce_gen_pool_prepare_records(void);
 extern int (*mce_severity)(struct mce *a, int tolerant, char **msg, bool 
is_excp);
 struct dentry *mce_get_debugfs_dir(void);
 
-extern struct mce_bank *mce_banks;
 extern mce_banks_t mce_banks_ce_disabled;
 
 #ifdef CONFIG_X86_MCE_INTEL
-- 
2.17.1



[PATCH v2 2/6] x86/MCE: Handle MCA controls in a per_cpu way

2019-04-11 Thread Ghannam, Yazen
From: Yazen Ghannam 

Current AMD systems have unique MCA banks per logical CPU even though
the type of the banks may all align to the same bank number. Each CPU
will have control of a set of MCA banks in the hardware and these are
not shared with other CPUs.

For example, bank 0 may be the Load-Store Unit on every logical CPU, but
each bank 0 is a unique structure in the hardware. In other words, there
isn't a *single* Load-Store Unit at MCA bank 0 that all logical CPUs
share.

This idea extends even to non-core MCA banks. For example, CPU0 and CPU4
may see a Unified Memory Controller at bank 15, but each CPU is actually
seeing a unique hardware structure that is not shared with other CPUs.

Because the MCA banks are all unique hardware structures, it would be
good to control them in a more granular way. For example, if there is a
known issue with the Floating Point Unit on CPU5 and a user wishes to
disable an error type on the Floating Point Unit, then it would be good
to do this only for CPU5 rather than all CPUs.

Also, future AMD systems may have heterogeneous MCA banks. Meaning the
bank numbers may not necessarily represent the same types between CPUs.
For example, bank 20 visible to CPU0 may be a Unified Memory Controller
and bank 20 visible to CPU4 may be a Coherent Slave. So granular control
will be even more necessary should the user wish to control specific MCA
banks.

Split the device attributes from struct mce_bank leaving only the MCA
bank control fields.

Make struct mce_banks[] per_cpu in order to have more granular control
over individual MCA banks in the hardware.

Allocate the device attributes statically based on the maximum number of
MCA banks supported. The sysfs interface will use as many as needed per
CPU. Currently, this is set to mca_cfg.banks, but will be changed to a
per_cpu bank count in a future patch.

Allocate the MCA control bits dynamically. Use the maximum number of MCA
banks supported for now. This will be changed to a per_cpu bank count in
a future patch.

Redo the sysfs store/show functions to handle the per_cpu mce_banks[].

Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190408141205.12376-3-yazen.ghan...@amd.com

v1->v2:
* Change "struct mce_bank*" to "struct mce_bank *" in definition.

 arch/x86/kernel/cpu/mce/core.c | 77 ++
 1 file changed, 51 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 8d0d1e8425db..aa41f41e5931 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -64,16 +64,21 @@ static DEFINE_MUTEX(mce_sysfs_mutex);
 
 DEFINE_PER_CPU(unsigned, mce_exception_count);
 
+struct mce_bank {
+   u64 ctl;/* subevents to enable */
+   boolinit;   /* initialise bank? */
+};
+static DEFINE_PER_CPU_READ_MOSTLY(struct mce_bank *, mce_banks);
+
 #define ATTR_LEN   16
 /* One object for each MCE bank, shared by all CPUs */
-struct mce_bank {
-   u64 ctl;/* subevents to enable 
*/
-   boolinit;   /* initialise bank? */
+struct mce_bank_dev {
struct device_attribute attr;   /* device attribute */
charattrname[ATTR_LEN]; /* attribute name */
+   u8  bank;   /* bank number */
 };
+static struct mce_bank_dev mce_bank_devs[MAX_NR_BANKS];
 
-static struct mce_bank *mce_banks __read_mostly;
 struct mce_vendor_flags mce_flags __read_mostly;
 
 struct mca_config mca_cfg __read_mostly = {
@@ -695,7 +700,7 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t 
*b)
m.tsc = rdtsc();
 
for (i = 0; i < mca_cfg.banks; i++) {
-   if (!mce_banks[i].ctl || !test_bit(i, *b))
+   if (!this_cpu_read(mce_banks)[i].ctl || !test_bit(i, *b))
continue;
 
m.misc = 0;
@@ -1138,7 +1143,7 @@ static void __mc_scan_banks(struct mce *m, struct mce 
*final,
if (!test_bit(i, valid_banks))
continue;
 
-   if (!mce_banks[i].ctl)
+   if (!this_cpu_read(mce_banks)[i].ctl)
continue;
 
m->misc = 0;
@@ -1475,16 +1480,19 @@ static int __mcheck_cpu_mce_banks_init(void)
 {
int i;
 
-   mce_banks = kcalloc(MAX_NR_BANKS, sizeof(struct mce_bank), GFP_KERNEL);
-   if (!mce_banks)
+   per_cpu(mce_banks, smp_processor_id()) =
+   kcalloc(MAX_NR_BANKS, sizeof(struct mce_bank), GFP_KERNEL);
+
+   if (!this_cpu_read(mce_banks))
return -ENOMEM;
 
for (i = 0; i < MAX_NR_BANKS; i++) {
-   struct mce_bank *b = _banks[i];
+   struct mce_bank *b = _cpu_read(mce_banks)[i];
 
b->ctl = -1ULL;
b->init = 1;
}
+
return 0;
 }
 
@@ -1504,7 +1512,7 @@ static 

[PATCH v2 4/6] x86/MCE: Make number of MCA banks per_cpu

2019-04-11 Thread Ghannam, Yazen
From: Yazen Ghannam 

The number of MCA banks is provided per logical CPU. Historically, this
number has been the same across all CPUs, but this is not an
architectural guarantee. Future AMD systems may have MCA bank counts
that vary between logical CPUs in a system.

This issue was partially addressed in

commit ("006c077041dc x86/mce: Handle varying MCA bank counts")

by allocating structures using the maximum number of MCA banks and by
saving the maximum MCA bank count in a system as the global count. This
means that some extra structures are allocated. Also, this means that
CPUs will spend more time in the #MC and other handlers checking extra
MCA banks.

Define the number of MCA banks as a per_cpu variable. Replace all uses
of mca_cfg.banks with this.

Also, use the per_cpu bank count when allocating per_cpu structures.

Print the number of banks per CPU as a debug message for those who may
be interested.

Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190408141205.12376-5-yazen.ghan...@amd.com

v1->v2:
* Drop export of new variable and leave injector code as-is.
* Add "mce_" prefix to new "num_banks" variable.

 arch/x86/kernel/cpu/mce/amd.c  | 16 +-
 arch/x86/kernel/cpu/mce/core.c | 48 +-
 arch/x86/kernel/cpu/mce/internal.h |  2 +-
 3 files changed, 36 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index f0644b59848d..2aed94f3a23e 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -493,7 +493,7 @@ static u32 get_block_address(u32 current_addr, u32 low, u32 
high,
 {
u32 addr = 0, offset = 0;
 
-   if ((bank >= mca_cfg.banks) || (block >= NR_BLOCKS))
+   if ((bank >= per_cpu(mce_num_banks, cpu)) || (block >= NR_BLOCKS))
return addr;
 
if (mce_flags.smca)
@@ -605,7 +605,7 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
 
disable_err_thresholding(c);
 
-   for (bank = 0; bank < mca_cfg.banks; ++bank) {
+   for (bank = 0; bank < this_cpu_read(mce_num_banks); ++bank) {
if (mce_flags.smca)
smca_configure(bank, cpu);
 
@@ -948,7 +948,7 @@ static void amd_deferred_error_interrupt(void)
 {
unsigned int bank;
 
-   for (bank = 0; bank < mca_cfg.banks; ++bank)
+   for (bank = 0; bank < this_cpu_read(mce_num_banks); ++bank)
log_error_deferred(bank);
 }
 
@@ -989,7 +989,7 @@ static void amd_threshold_interrupt(void)
struct threshold_block *first_block = NULL, *block = NULL, *tmp = NULL;
unsigned int bank, cpu = smp_processor_id();
 
-   for (bank = 0; bank < mca_cfg.banks; ++bank) {
+   for (bank = 0; bank < this_cpu_read(mce_num_banks); ++bank) {
if (!(per_cpu(bank_map, cpu) & (1 << bank)))
continue;
 
@@ -1176,7 +1176,7 @@ static int allocate_threshold_blocks(unsigned int cpu, 
unsigned int bank,
u32 low, high;
int err;
 
-   if ((bank >= mca_cfg.banks) || (block >= NR_BLOCKS))
+   if ((bank >= per_cpu(mce_num_banks, cpu)) || (block >= NR_BLOCKS))
return 0;
 
if (rdmsr_safe_on_cpu(cpu, address, , ))
@@ -1410,7 +1410,7 @@ int mce_threshold_remove_device(unsigned int cpu)
 {
unsigned int bank;
 
-   for (bank = 0; bank < mca_cfg.banks; ++bank) {
+   for (bank = 0; bank < per_cpu(mce_num_banks, cpu); ++bank) {
if (!(per_cpu(bank_map, cpu) & (1 << bank)))
continue;
threshold_remove_bank(cpu, bank);
@@ -1431,14 +1431,14 @@ int mce_threshold_create_device(unsigned int cpu)
if (bp)
return 0;
 
-   bp = kcalloc(mca_cfg.banks, sizeof(struct threshold_bank *),
+   bp = kcalloc(per_cpu(mce_num_banks, cpu), sizeof(struct threshold_bank 
*),
 GFP_KERNEL);
if (!bp)
return -ENOMEM;
 
per_cpu(threshold_banks, cpu) = bp;
 
-   for (bank = 0; bank < mca_cfg.banks; ++bank) {
+   for (bank = 0; bank < per_cpu(mce_num_banks, cpu); ++bank) {
if (!(per_cpu(bank_map, cpu) & (1 << bank)))
continue;
err = threshold_create_bank(cpu, bank);
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index aa41f41e5931..0fe29140ecab 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -64,6 +64,8 @@ static DEFINE_MUTEX(mce_sysfs_mutex);
 
 DEFINE_PER_CPU(unsigned, mce_exception_count);
 
+DEFINE_PER_CPU_READ_MOSTLY(u8, mce_num_banks);
+
 struct mce_bank {
u64 ctl;/* subevents to enable */
boolinit;   /* initialise bank? */
@@ -699,7 +701,7 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t 
*b)
if (flags & MCP_TIMESTAMP)
m.tsc = rdtsc();
 
-   for (i = 0; i < mca_cfg.banks; i++) {
+   for (i = 0; i < 

[PATCH v2 5/6] x86/MCE: Save MCA control bits that get set in hardware

2019-04-11 Thread Ghannam, Yazen
From: Yazen Ghannam 

The OS is expected to write all bits in MCA_CTL. However, only
implemented bits get set in the hardware.

Read back MCA_CTL so that the value in the hardware is saved and
reported through sysfs.

Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190408141205.12376-6-yazen.ghan...@amd.com

v1->v2:
* No change.

 arch/x86/kernel/cpu/mce/core.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 0fe29140ecab..71662133c70c 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1565,10 +1565,13 @@ static void __mcheck_cpu_init_clear_banks(void)
for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
struct mce_bank *b = _cpu_read(mce_banks)[i];
 
-   if (!b->init)
-   continue;
-   wrmsrl(msr_ops.ctl(i), b->ctl);
-   wrmsrl(msr_ops.status(i), 0);
+   if (b->init) {
+   wrmsrl(msr_ops.ctl(i), b->ctl);
+   wrmsrl(msr_ops.status(i), 0);
+   }
+
+   /* Save bits set in hardware. */
+   rdmsrl(msr_ops.ctl(i), b->ctl);
}
 }
 
@@ -2312,8 +2315,10 @@ static void mce_reenable_cpu(void)
for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
struct mce_bank *b = _cpu_read(mce_banks)[i];
 
-   if (b->init)
+   if (b->init) {
wrmsrl(msr_ops.ctl(i), b->ctl);
+   rdmsrl(msr_ops.ctl(i), b->ctl);
+   }
}
 }
 
-- 
2.17.1



[PATCH v2 6/6] x86/MCE: Treat MCE bank as initialized if control bits set in hardware

2019-04-11 Thread Ghannam, Yazen
From: Yazen Ghannam 

The OS is expected to write all bits to MCA_CTL for each bank. However,
some banks may be unused in which case the registers for such banks are
Read-as-Zero/Writes-Ignored. Also, the OS may not write any control bits
because of quirks, etc.

A bank can be considered uninitialized if the MCA_CTL register returns
zero. This is because either the OS did not write anything or because
the hardware is enforcing RAZ/WI for the bank.

Set a bank's init value based on if the control bits are set or not in
hardware.

Return an error code in the sysfs interface for uninitialized banks.

Signed-off-by: Yazen Ghannam 
---
Link:
https://lkml.kernel.org/r/20190408141205.12376-3-yazen.ghan...@amd.com

v1->v2:
* New in v2.
* Based on discussion from v1 patch 2.

 arch/x86/kernel/cpu/mce/core.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 71662133c70c..dcf5c6d72811 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1572,6 +1572,9 @@ static void __mcheck_cpu_init_clear_banks(void)
 
/* Save bits set in hardware. */
rdmsrl(msr_ops.ctl(i), b->ctl);
+
+   /* Bank is initialized if bits are set in hardware. */
+   b->init = !!b->ctl;
}
 }
 
@@ -2086,6 +2089,9 @@ static ssize_t show_bank(struct device *s, struct 
device_attribute *attr,
 
b = _cpu(mce_banks, s->id)[bank];
 
+   if (!b->init)
+   return -ENODEV;
+
return sprintf(buf, "%llx\n", b->ctl);
 }
 
@@ -2104,6 +2110,9 @@ static ssize_t set_bank(struct device *s, struct 
device_attribute *attr,
 
b = _cpu(mce_banks, s->id)[bank];
 
+   if (!b->init)
+   return -ENODEV;
+
b->ctl = new;
mce_restart();
 
-- 
2.17.1



RE: [PATCH RESEND 2/5] x86/MCE: Handle MCA controls in a per_cpu way

2019-04-10 Thread Ghannam, Yazen
> -Original Message-
> From: linux-edac-ow...@vger.kernel.org  On 
> Behalf Of Borislav Petkov
> Sent: Wednesday, April 10, 2019 12:26 PM
> To: Ghannam, Yazen 
> Cc: linux-e...@vger.kernel.org; linux-kernel@vger.kernel.org; 
> tony.l...@intel.com; x...@kernel.org
> Subject: Re: [PATCH RESEND 2/5] x86/MCE: Handle MCA controls in a per_cpu way
> 
> On Wed, Apr 10, 2019 at 04:58:12PM +, Ghannam, Yazen wrote:
> > Yes, unused banks in the middle are counted in the MCG_CAP[Count] value.
> 
> Good.
> 
> > Okay, so you're saying the sysfs access should fail if a bank is
> > disabled. Is that correct?
> 
> Well, think about it. If a bank is not operational for whatever reason,
> we should tell the user that.
> 
> > Does "disabled" mean one or both of these?
> > Unused = RAZ/WI in hardware
> > Uninitialized = Not initialized by kernel due to quirks, etc.
> >
> > For an unused bank, it doesn't hurt to write MCA_CTL, but really
> > there's no reason to do so and go through mce_restart().
> 
> Yes, but that bank is non-operational in some form. So we should prevent
> all writes to it because, well, it is not going to do anything. And this
> would be a good way to give feedback to the user that that is the case.
> 
> > For an uninitialized bank, should we prevent users from overriding the
> > kernel's settings?
> 
> That all depends on the quirks. Whether we should allow them to be
> overridden or not. I don't think we've ever thought about it, though.
> 
> Let's look at one:
> 
> if (c->x86_vendor == X86_VENDOR_AMD) {
> if (c->x86 == 15 && cfg->banks > 4) {
> /*
>  * disable GART TBL walk error reporting, which
>  * trips off incorrectly with the IOMMU & 3ware
>  * & Cerberus:
>  */
> clear_bit(10, (unsigned long *)_banks[4].ctl);
> 
> 
> Yah, so if the user reenables those GART errors, then she/he will see a
> lot of MCEs reported and will maybe complain about it. And then we'll
> say, but why did you enable them then. And she/he'll say: uh, didn't
> know. Or, I was just poking at sysfs and this happened.
> 
> Then we can say, well, don't do that then! :-)
> 
> So my current position is, meh, who cares. But then I'm looking at
> another quirk:
> 
> if (c->x86_vendor == X86_VENDOR_INTEL) {
> /*
>  * SDM documents that on family 6 bank 0 should not be written
>  * because it aliases to another special BIOS controlled
>  * register.
>  * But it's not aliased anymore on model 0x1a+
>  * Don't ignore bank 0 completely because there could be a
>  * valid event later, merely don't write CTL0.
>  */
> 
> if (c->x86 == 6 && c->x86_model < 0x1A && cfg->banks > 0)
> mce_banks[0].init = 0;
> 
> 
> which basically prevents that bank from being reinitialized. So I guess
> we have that functionality already - we simply need to pay attention to
> w->init.
> 
> Right?

Okay, I'm with you.

So I'm thinking to add another patch to the set. This will set mce_bank.init=0 
if we read MCA_CTL=0 from the hardware.

Then we check if mce_bank.init=0 in the set/show functions and give a message 
if the bank is not used.

How does that sound?

Thanks,
Yazen


RE: [PATCH RESEND 2/5] x86/MCE: Handle MCA controls in a per_cpu way

2019-04-10 Thread Ghannam, Yazen
> -Original Message-
> From: Borislav Petkov 
> Sent: Wednesday, April 10, 2019 11:41 AM
> To: Ghannam, Yazen 
> Cc: linux-e...@vger.kernel.org; linux-kernel@vger.kernel.org; 
> tony.l...@intel.com; x...@kernel.org
> Subject: Re: [PATCH RESEND 2/5] x86/MCE: Handle MCA controls in a per_cpu way
> 
> On Wed, Apr 10, 2019 at 04:36:30PM +, Ghannam, Yazen wrote:
> > We have this case on AMD Family 17h with Bank 4. The hardware enforces
> > this bank to be Read-as-Zero/Writes-Ignored.
> >
> > This behavior is enforced whether the bank is in the middle or at the
> > end.
> 
> Does num_banks contain the disabled bank? If so, then it will work.
> 

Yes, unused banks in the middle are counted in the MCG_CAP[Count] value.

> > I'm thinking to redo the sysfs interface for banks in another patch
> > set. I could include a new file to indicate enabled/disabled, or maybe
> > just update the documentation to describe this case.
> 
> No, the write to the bank controls should fail on a disabled bank.
> 

Okay, so you're saying the sysfs access should fail if a bank is disabled. Is 
that correct?

Does "disabled" mean one or both of these?
Unused = RAZ/WI in hardware
Uninitialized = Not initialized by kernel due to quirks, etc.

For an unused bank, it doesn't hurt to write MCA_CTL, but really there's no 
reason to do so and go through mce_restart().

For an uninitialized bank, should we prevent users from overriding the kernel's 
settings?

Thanks,
Yazen


 


RE: [PATCH RESEND 2/5] x86/MCE: Handle MCA controls in a per_cpu way

2019-04-10 Thread Ghannam, Yazen
> -Original Message-
> From: Borislav Petkov 
> Sent: Tuesday, April 9, 2019 3:34 PM
> To: Ghannam, Yazen 
> Cc: linux-e...@vger.kernel.org; linux-kernel@vger.kernel.org; 
> tony.l...@intel.com; x...@kernel.org
> Subject: Re: [PATCH RESEND 2/5] x86/MCE: Handle MCA controls in a per_cpu way
> 
> On Mon, Apr 08, 2019 at 06:55:59PM +, Ghannam, Yazen wrote:
> > We already have the case where some banks are not initialized either
> > due to quirks or because they are Read-as-Zero, but we don't try to
> > skip creating their files. With this full set (see patch 5), an unused
> > bank will return a control value of 0.
> 
> So set_bank() is changed to do:
> 
> @@ -2088,7 +2097,7 @@ static ssize_t set_bank(struct device *s, struct 
> device_attribute *attr,
> if (kstrtou64(buf, 0, ) < 0)
> return -EINVAL;
> 
> -   if (bank >= mca_cfg.banks)
> +   if (bank >= per_cpu(num_banks, s->id))
> return -EINVAL;
> 
> 
> How would that work if the disabled/not-present bank is in the middle?
> The old example: bank3 on CPU5.
> 
> > Would that be sufficient to indicate that a bank is not used?
> 
> Well, it should not allow for any control bits to be set and it should
> have the proper bank number.
> 

We have this case on AMD Family 17h with Bank 4. The hardware enforces this 
bank to be Read-as-Zero/Writes-Ignored.

This behavior is enforced whether the bank is in the middle or at the end.

> > But I do have a couple of thoughts:
> 
> > 1) Will missing banks confuse users? As mentioned, we already have the
> > case of unused/uninitialized banks today, but we don't skip their file
> > creation. a) Will this affect any userspace tools?
> 
> I guess it would be easier if we keep creating all files but denote properly
> which banks are disabled.
> 

I'm thinking to redo the sysfs interface for banks in another patch set. I 
could include a new file to indicate enabled/disabled, or maybe just update the 
documentation to describe this case.

Thanks,
Yazen


RE: [PATCH RESEND 4/5] x86/MCE: Make number of MCA banks per_cpu

2019-04-08 Thread Ghannam, Yazen
> -Original Message-
> From: linux-edac-ow...@vger.kernel.org  On 
> Behalf Of Luck, Tony
> Sent: Monday, April 8, 2019 6:23 PM
> To: Ghannam, Yazen 
> Cc: Borislav Petkov ; linux-e...@vger.kernel.org; 
> linux-kernel@vger.kernel.org; x...@kernel.org
> Subject: Re: [PATCH RESEND 4/5] x86/MCE: Make number of MCA banks per_cpu
> 
> On Mon, Apr 08, 2019 at 10:48:34PM +, Ghannam, Yazen wrote:
> > Okay, so drop the export and leave the injector code as-is (it's
> > already doing a rdmsrl_on_cpu()).
> 
> It's still a globally visible symbol (shared by core.c and amd.c).
> So I think it needs a "mce_" prefix.
> 
> While it doesn't collide now, there are a bunch of other
> subsystems that have "banks" and a variable to count them.
> 
> Look at output from "git grep -w num_banks".
> 

Okay, I'll add the prefix.

And thanks for the tip. I'll try to keep this in mind.

Thanks,
Yazen 


  1   2   3   4   >