From: Fenghua Yu <>

Change log in v4:

* Changed CONFIG_INTEL_RDT to CONFIG_RDT_A. Plain "RDT" refers to all
  resource features, both monitoring (CQM, MBM, ...) and control (CAT L3,
  L3/CDP, L2, ...). Adding the "_A" matches with the feature bit name
  for all the control features "X86_FEATURE_RDT_A".
* Nilay: Cleaned up duplicate declarations (and ones that appear in the
  wrong patch)
* Boris: Add comment on which specific Haswell models support CAT L3
* Boris: Use boot_cpu_data and boot_cpu_has()
* Don't call smp_call_function_single() from hot add notifier. We are already
  on the right cpu.
* Thomas (from earlier postings) check return value from rdtgroup_init() and
  cleanup if it failed
* Boris: Be more descriptive than "cache allocation" (we now say "RDT cache
* Nilay: Move some code out of mount() [mostly into closid_alloc()]
* Nilay: Drop wrapper functions rdtgroup_{alloc,free}, just call kzalloc/kfree
* smp_call_function_many() skips current cpu (to Tony's surprise). Make sure
  we make the call locally if current cpu is included in the mask.
* Also be preempt aware around calls to smp_call_function_many()
* Nilay: Add extra parens in:
        if ((attr == &dev_attr_id.attr) && (this_leaf->attributes & CACHE_ID))
* Added a comment for closid_alloc that current allocation is global and will
  do per resource domain allocation later.
* Added CAT L2 support and changed commit message in patch #8.

The patches are in the same order as V3 and have small commit messages changes
in patch #7 and #8.


        These three define an "id" for each cache ... we need a "name"
        for a cache so we can say what restrictions to apply to each
        cache in the system.  All you will see at this point is an
        extra "id" file in each /sys/devices/system/cpu/cpu*/cache/index*/


        Look at CPUID for the features related to cache allocation.
        At this point /proc/cpuinfo shows extra flags for the features
        found on your system.


        Documentation patch could be anywhere in this sequence. We
        put in early so you can read it to see how to use the


        Add CONFIG_INTEL_RDT (default "n" ... you'll have to set
        it to have this, and all the following patches do anything).
        Template driver here just checks for features and spams
        the console with one line for each.


        There are some Haswell systems that support cache allocation,
        but they were made before the CPUID bits were fully defined.
        So we check by probing the CBM base MSR to see if CLOSID
        bits stick. Unless you have one of these Haswells, you won't
        see any difference here.


        This is all new code, not seen in the previous versions of this
        patch series. L3 and L2 cache allocations are just the first of
        several resource control features. Define rdt_resource structure
        that contains all the useful things we need to know about a
        resource. Pick up the parameters for the resource from CPUID.
        The console spam strings change format here.


        The PQR_ASSOC MSR has a field for the CLOSID (which we need
        define which allocation rules are in effect). But it also
        contains the RMID (used by CQM and MBM perf monitoring).
        The perf code got here first, but defined structures that
        make it easy for the two systems to co-exist without stomping
        on each other. This patch moves the relevant parts into a
        common header file and changes the scope from "static" to
        global so we can access them. No visible change.


        For each enabled resource, we build a list of "rdt_domains" based
        on hotplug cpu notifications. Since we only have L3 at this point,
        this is just a list of L3 caches (named by the "id" established
        in the first three patches). As each cache is found we initialize
        the array of CBMs (cache bit masks). No visible change here.


        Our interface is a kernfs backed file system. Establish the
        mount point, and provide mount/unmount functionality.
        At this point "/sys/fs/resctrl" appears. You can mount and
        unmount the resctrl file system (if your system supports
        code/data prioritization, you can use the "cdp" mount option).
        The file system is empty and doesn't allow creation of any
        files or subdirectories.


        Parameters for each resource are buried in CPUID leaf 0x10.
        This isn't very user friendly for scripts and applications
        that want to configure resource allocation. Create an
        "info" directory, with a subdirectory for each resource
        containing a couple of useful parameters. Visible change:
        $ ls -l /sys/fs/resctrl/info/L3
        total 0
        -r--r--r-- 1 root root 0 Oct  7 11:20 cbm_val
        -r--r--r-- 1 root root 0 Oct  7 11:20 num_closid


        Each resource group is represented by a directory in the
        resctrl file system. The root directory is the default group.
        Use "mkdir" to create new groups and "rmdir" to remove them.
        The maximum number of groups is defined by the effective
        number of CLOSIDs.
        Visible change: If you have CDP (and enable with the "cdp"
        mount option) you will find that you can only create half
        as many groups as without (e.g. 8 vs. 16 on Broadwell, but
        the default group uses one ... so actually 7, 15).


        One of the control mechanisms for a resource group is the
        logical CPU. Initially all CPUs are assigned to the default
        group. They can be reassigned to other groups by writing
        a cpumask to the "cpus" file. See the documentation for what
        this means.
        Visible change: "cpus" file in the root, and automatically
        in each created subdirectory. You can "echo" masks to these
        files and watch as CPUs added to one group are removed from
        whatever group they previously belonged to. Removing a directory
        will give all CPUs owned by it back to the default (root)


        Tasks can be assigned to resource groups by writing their PID
        to a "tasks" file (which removes the task from its previous
        group). Forked/cloned tasks inherit the group from their
        parent. You cannot remove a group (directory) that has any
        tasks assigned.
        Visible change: "tasks" files appear. E.g. (we see two tasks
        in the group, our shell, and the "cat" that it spawned).
        # echo $$ > p0/tasks; cat p0/tasks


        The "schemata" file in each group/directory defines what
        access tasks controlled by this resource are permitted.
        One line per resource type. Fields for each instance of
        the resource. You redefine the access by wrting to the
        file in the same format.
        Visible change: "schemata" file which starts out with maximum
        allowed resources. E.g.
        $ cat schemata
        Now restrict this group to just 20% of L3 on first cache, but
        allow 50% on the second
        # echo L3:0=f;1=3ff > schemata


        When context switching we check if we are changing resource
        groups for the new process, and update the PQR_ASSOC MSR with
        the new CLOSID if needed.
        Visble change: Everything should be working now. Tasks run with
        the permitted access to L3 cache.


        New files ... need a maintainer. Fenghua has the job.

Fenghua Yu (15):
  Documentation, ABI: Add a document entry for cache id
  cacheinfo: Introduce cache id
  x86, intel_cacheinfo: Enable cache id in x86
  x86/intel_rdt: Feature discovery
  Documentation, x86: Documentation for Intel resource allocation user
  x86/intel_rdt: Add CONFIG, Makefile, and basic initialization
  x86/intel_rdt: Add Haswell feature discovery
  x86/intel_rdt: Pick up L3/L2 RDT parameters from CPUID
  x86/cqm: Move PQR_ASSOC management code into generic code used by both
    CQM and CAT
  x86/intel_rdt: Add basic resctrl filesystem support
  x86/intel_rdt: Add "info" files to resctrl file system
  x86/intel_rdt: Add mkdir to resctrl file system
  x86/intel_rdt: Add tasks files
  x86/intel_rdt: Add scheduler hook
  MAINTAINERS: Add maintainer for Intel RDT resource allocation

Tony Luck (3):
  x86/intel_rdt: Build structures for each resource based on cache
  x86/intel_rdt: Add cpus file
  x86/intel_rdt: Add schemata file

 Documentation/ABI/testing/sysfs-devices-system-cpu |  16 +
 Documentation/x86/intel_rdt_ui.txt                 | 162 ++++
 MAINTAINERS                                        |   8 +
 arch/x86/Kconfig                                   |  12 +
 arch/x86/events/intel/cqm.c                        |  23 +-
 arch/x86/include/asm/cpufeatures.h                 |   5 +
 arch/x86/include/asm/intel_rdt.h                   | 206 +++++
 arch/x86/include/asm/intel_rdt_common.h            |  27 +
 arch/x86/kernel/cpu/Makefile                       |   2 +
 arch/x86/kernel/cpu/intel_cacheinfo.c              |  20 +
 arch/x86/kernel/cpu/intel_rdt.c                    | 304 +++++++
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c           | 973 +++++++++++++++++++++
 arch/x86/kernel/cpu/intel_rdt_schemata.c           | 266 ++++++
 arch/x86/kernel/cpu/scattered.c                    |   3 +
 arch/x86/kernel/process_32.c                       |   4 +
 arch/x86/kernel/process_64.c                       |   4 +
 drivers/base/cacheinfo.c                           |   5 +
 include/linux/cacheinfo.h                          |   3 +
 include/linux/sched.h                              |   3 +
 include/uapi/linux/magic.h                         |   1 +
 20 files changed, 2026 insertions(+), 21 deletions(-)
 create mode 100644 Documentation/x86/intel_rdt_ui.txt
 create mode 100644 arch/x86/include/asm/intel_rdt.h
 create mode 100644 arch/x86/include/asm/intel_rdt_common.h
 create mode 100644 arch/x86/kernel/cpu/intel_rdt.c
 create mode 100644 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
 create mode 100644 arch/x86/kernel/cpu/intel_rdt_schemata.c


Reply via email to