Intel Memory Bandwidth Monitoring (MBM) counters may report system
memory bandwidth incorrectly on some Intel processors. The errata are
reported in erratum SKX99 [1], erratum BDF102 [2] and RDT reference
manual [3].

To work around the errata, MBM total and local readings are corrected
using a correction factor table.

Since the correction factor table is not publicly documented anywhere,
the table and the errata are documented in Documentation/x86/resctrl.rst
for future reference. The resctrl.rst file is renamed from
Documentation/x86/resctrl_ui.rst because the file won't contain user
interface only anymore.

1. Erratum SKX99 in Intel Xeon Processor Scalable Family Specification
   Update:
http://web.archive.org/web/20200716124958/https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-spec-update.html
2. Erratum BDF102 in Intel Xeon E5-2600 v4 Processor Product Family
   Specification Update:
http://web.archive.org/web/20191125200531/https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-v4-spec-update.pdf
3. The errata in Intel Resource Director Technology (Intel RDT) on 2nd
   Generation Intel Xeon Scalable Processors Reference Manual:
https://software.intel.com/content/www/us/en/develop/articles/intel-resource-director-technology-rdt-reference-manual.html

Suggested-by: Borislav Petkov <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
---
Change Log:
v2:
- Document the correction factor table and errata in resctrl.rst (Boris).
- Change the documentation URLs to stable archive.org (Tony).

 Documentation/conf.py                         |  2 +-
 Documentation/x86/index.rst                   |  2 +-
 .../x86/{resctrl_ui.rst => resctrl.rst}       | 82 +++++++++++++++++++
 3 files changed, 84 insertions(+), 2 deletions(-)
 rename Documentation/x86/{resctrl_ui.rst => resctrl.rst} (92%)

diff --git a/Documentation/conf.py b/Documentation/conf.py
index c503188880d9..b5b2be8eec22 100644
--- a/Documentation/conf.py
+++ b/Documentation/conf.py
@@ -36,7 +36,7 @@ needs_sphinx = '1.3'
 # Add any Sphinx extension module names here, as strings. They can be
 # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
 # ones.
-extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain',
+extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include',
               'kfigure', 'sphinx.ext.ifconfig', 'automarkup',
               'maintainers_include', 'sphinx.ext.autosectionlabel' ]
 
diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index 265d9e9a093b..49d2fd9f0e5b 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -25,7 +25,7 @@ x86-specific Documentation
    pti
    mds
    microcode
-   resctrl_ui
+   resctrl
    tsx_async_abort
    usb-legacy-support
    i386/index
diff --git a/Documentation/x86/resctrl_ui.rst b/Documentation/x86/resctrl.rst
similarity index 92%
rename from Documentation/x86/resctrl_ui.rst
rename to Documentation/x86/resctrl.rst
index e59b7b93a9b4..8b8ca6de5e1f 100644
--- a/Documentation/x86/resctrl_ui.rst
+++ b/Documentation/x86/resctrl.rst
@@ -1209,3 +1209,85 @@ View the llc occupancy snapshot::
 
   # cat /sys/fs/resctrl/p1/mon_data/mon_L3_00/llc_occupancy
   11234000
+
+Intel RDT Errata
+================
+
+Intel MBM Counters May Report System Memory Bandwidth Incorrectly
+-----------------------------------------------------------------
+
+Errata SKX99 for Skylake server and BDF102 for Broadwell server.
+
+Problem: Intel Memory Bandwidth Monitoring (MBM) counters track metrics
+according to the assigned Resource Monitor ID (RMID) for that logical core.
+The IA32_QM_CTR register(MSR 0xC8E), used to report these metrics, may
+report incorrect system bandwidth for certain RMID values.
+
+Implication: Due to the errata, system memory bandwidth may not match
+what is reported.
+
+Workaround: The kernel works around the errata.
+
+MBM total and local readings are corrected by the following correction
+factor table for the errata:
+
++---------------+---------------+---------------+-----------------+
+|core count    |rmid count     |rmid threshold |correction factor|
++---------------+---------------+---------------+-----------------+
+|1             |8              |0              |1.000000         |
++---------------+---------------+---------------+-----------------+
+|2             |16             |0              |1.000000         |
++---------------+---------------+---------------+-----------------+
+|3             |24             |15             |0.969650         |
++---------------+---------------+---------------+-----------------+
+|4             |32             |0              |1.000000         |
++---------------+---------------+---------------+-----------------+
+|6             |48             |31             |0.969650         |
++---------------+---------------+---------------+-----------------+
+|7             |56             |47             |1.142857         |
++---------------+---------------+---------------+-----------------+
+|8             |64             |0              |1.000000         |
++---------------+---------------+---------------+-----------------+
+|9             |72             |63             |1.185115         |
++---------------+---------------+---------------+-----------------+
+|10            |80             |63             |1.066553         |
++---------------+---------------+---------------+-----------------+
+|11            |88             |79             |1.454545         |
++---------------+---------------+---------------+-----------------+
+|12            |96             |0              |1.000000         |
++---------------+---------------+---------------+-----------------+
+|13            |104            |95             |1.230769         |
++---------------+---------------+---------------+-----------------+
+|14            |112            |95             |1.142857         |
++---------------+---------------+---------------+-----------------+
+|15            |120            |95             |1.066667         |
++---------------+---------------+---------------+-----------------+
+|16            |128            |0              |1.000000         |
++---------------+---------------+---------------+-----------------+
+|17            |136            |127            |1.254863         |
++---------------+---------------+---------------+-----------------+
+|18            |144            |127            |1.185255         |
++---------------+---------------+---------------+-----------------+
+|19            |152            |0              |1.000000         |
++---------------+---------------+---------------+-----------------+
+|20            |160            |127            |1.066667         |
++---------------+---------------+---------------+-----------------+
+|21            |168            |0              |1.000000         |
++---------------+---------------+---------------+-----------------+
+|22            |176            |159            |1.454334         |
++---------------+---------------+---------------+-----------------+
+|23            |184            |0              |1.000000         |
++---------------+---------------+---------------+-----------------+
+|24            |192            |127            |0.969744         |
++---------------+---------------+---------------+-----------------+
+|25            |200            |191            |1.280246         |
++---------------+---------------+---------------+-----------------+
+|26            |208            |191            |1.230921         |
++---------------+---------------+---------------+-----------------+
+|27            |216            |0              |1.000000         |
++---------------+---------------+---------------+-----------------+
+|28            |224            |191            |1.143118         |
++---------------+---------------+---------------+-----------------+
+
+If rmid > rmid threshold, MBM total and local values should be multiplied
+by the correction factor.
-- 
2.28.0

Reply via email to