On Fri, 13 Mar 2026, Reinette Chatre wrote: > Dave Martin reported inconsistent CMT test failures. In one experiment > the first run of the CMT test failed because of too large (24%) difference > between measured and achievable cache occupancy while the second run passed > with an acceptable 4% difference. > > The CMT test is susceptible to interference from the rest of the system. > This can be demonstrated with a utility like stress-ng by running the CMT > test while introducing cache misses using: > > stress-ng --matrix-3d 0 --matrix-3d-zyx > > Below shows an example of the CMT test failing because of a significant > difference between measured and achievable cache occupancy when run with > interference: > # Starting CMT test ... > # Mounting resctrl to "/sys/fs/resctrl" > # Cache size :335544320 > # Writing benchmark parameters to resctrl FS > # Benchmark PID: 7011 > # Checking for pass/fail > # Fail: Check cache miss rate within 15% > # Percent diff=99 > # Number of bits: 5 > # Average LLC val: 235929 > # Cache span (bytes): 83886080 > not ok 1 CMT: test > > The CMT test creates a new control group that is also capable of monitoring > and assigns the workload to it. The workload allocates a buffer that by > default fills a portion of the L3 and keeps reading from the buffer, > measuring the L3 occupancy at intervals. The test passes if the workload's > L3 occupancy is within 15% of the buffer size. > > By not adjusting any capacity bitmasks the workload shares the cache with > the rest of the system. Any other task that may be running could evict > the workload's data from the cache causing it to have low cache occupancy. > > Reduce interference from the rest of the system by ensuring that the > workload's control group uses the capacity bitmask found in the user > parameters for L3 and that the rest of the system can only allocate into > the inverse of the workload's L3 cache portion. Other tasks can thus no > longer evict the workload's data from L3. > > With the above adjustments the CMT test is more consistent. Repeating the > CMT test while generating interference with stress-ng on a sample > system after applying the fixes show significant improvement in test > accuracy: > > # Starting CMT test ... > # Mounting resctrl to "/sys/fs/resctrl" > # Cache size :335544320 > # Writing benchmark parameters to resctrl FS > # Write schema "L3:0=fffe0" to resctrl FS > # Write schema "L3:0=1f" to resctrl FS > # Benchmark PID: 7089 > # Checking for pass/fail > # Pass: Check cache miss rate within 15% > # Percent diff=12 > # Number of bits: 5 > # Average LLC val: 73269248 > # Cache span (bytes): 83886080 > ok 1 CMT: test > > Reported-by: Dave Martin <[email protected]> > Signed-off-by: Reinette Chatre <[email protected]> > Tested-by: Chen Yu <[email protected]> > Link: https://lore.kernel.org/lkml/[email protected]/ > --- > Changes since v1: > - Fix typo in changelog: "data my be in L2" -> "data may be in L2". > > Changes since v2: > - Split patch to separate changes impacting L3 and L2 resource. (Ilpo) > - Re-run tests after patch split to ensure test impact match patch > and update changelog with refreshed data. > - Since fix is now split across two patches: "Closes:" -> "Link:" > - Rename "long_mask" to "full_mask". (Ilpo) > - Add Chen Yu's tag. > --- > tools/testing/selftests/resctrl/cmt_test.c | 26 +++++++++++++++++-- > tools/testing/selftests/resctrl/mba_test.c | 4 ++- > tools/testing/selftests/resctrl/mbm_test.c | 4 ++- > tools/testing/selftests/resctrl/resctrl.h | 4 ++- > tools/testing/selftests/resctrl/resctrl_val.c | 2 +- > 5 files changed, 34 insertions(+), 6 deletions(-) > > diff --git a/tools/testing/selftests/resctrl/cmt_test.c > b/tools/testing/selftests/resctrl/cmt_test.c > index d09e693dc739..7bc6cf49c1c5 100644 > --- a/tools/testing/selftests/resctrl/cmt_test.c > +++ b/tools/testing/selftests/resctrl/cmt_test.c > @@ -19,12 +19,34 @@ > #define CON_MON_LCC_OCCUP_PATH \ > "%s/%s/mon_data/mon_L3_%02d/llc_occupancy" > > -static int cmt_init(const struct resctrl_val_param *param, int domain_id) > +/* > + * Initialize capacity bitmasks (CBMs) of: > + * - control group being tested per test parameters, > + * - default resource group as inverse of control group being tested to > prevent > + * other tasks from interfering with test. > + */ > +static int cmt_init(const struct resctrl_test *test, > + const struct user_params *uparams, > + const struct resctrl_val_param *param, int domain_id) > { > + unsigned long full_mask; > + char schemata[64]; > + int ret; > + > sprintf(llc_occup_path, CON_MON_LCC_OCCUP_PATH, RESCTRL_PATH, > param->ctrlgrp, domain_id); > > - return 0; > + ret = get_full_cbm(test->resource, &full_mask); > + if (ret) > + return ret; > + > + snprintf(schemata, sizeof(schemata), "%lx", ~param->mask & full_mask); > + ret = write_schemata("", schemata, uparams->cpu, test->resource); > + if (ret) > + return ret; > + > + snprintf(schemata, sizeof(schemata), "%lx", param->mask); > + return write_schemata(param->ctrlgrp, schemata, uparams->cpu, > test->resource); > } > > static int cmt_setup(const struct resctrl_test *test, > diff --git a/tools/testing/selftests/resctrl/mba_test.c > b/tools/testing/selftests/resctrl/mba_test.c > index c7e9adc0368f..cd4c715b7ffd 100644 > --- a/tools/testing/selftests/resctrl/mba_test.c > +++ b/tools/testing/selftests/resctrl/mba_test.c > @@ -17,7 +17,9 @@ > #define ALLOCATION_MIN 10 > #define ALLOCATION_STEP 10 > > -static int mba_init(const struct resctrl_val_param *param, int domain_id) > +static int mba_init(const struct resctrl_test *test, > + const struct user_params *uparams, > + const struct resctrl_val_param *param, int domain_id) > { > int ret; > > diff --git a/tools/testing/selftests/resctrl/mbm_test.c > b/tools/testing/selftests/resctrl/mbm_test.c > index 84d8bc250539..58201f844740 100644 > --- a/tools/testing/selftests/resctrl/mbm_test.c > +++ b/tools/testing/selftests/resctrl/mbm_test.c > @@ -83,7 +83,9 @@ static int check_results(size_t span) > return ret; > } > > -static int mbm_init(const struct resctrl_val_param *param, int domain_id) > +static int mbm_init(const struct resctrl_test *test, > + const struct user_params *uparams, > + const struct resctrl_val_param *param, int domain_id) > { > int ret; > > diff --git a/tools/testing/selftests/resctrl/resctrl.h > b/tools/testing/selftests/resctrl/resctrl.h > index afe635b6e48d..c72045c74ac4 100644 > --- a/tools/testing/selftests/resctrl/resctrl.h > +++ b/tools/testing/selftests/resctrl/resctrl.h > @@ -135,7 +135,9 @@ struct resctrl_val_param { > char filename[64]; > unsigned long mask; > int num_of_runs; > - int (*init)(const struct resctrl_val_param *param, > + int (*init)(const struct resctrl_test *test, > + const struct user_params *uparams, > + const struct resctrl_val_param *param, > int domain_id); > int (*setup)(const struct resctrl_test *test, > const struct user_params *uparams, > diff --git a/tools/testing/selftests/resctrl/resctrl_val.c > b/tools/testing/selftests/resctrl/resctrl_val.c > index 7c08e936572d..a5a8badb83d4 100644 > --- a/tools/testing/selftests/resctrl/resctrl_val.c > +++ b/tools/testing/selftests/resctrl/resctrl_val.c > @@ -569,7 +569,7 @@ int resctrl_val(const struct resctrl_test *test, > goto reset_affinity; > > if (param->init) { > - ret = param->init(param, domain_id); > + ret = param->init(test, uparams, param, domain_id); > if (ret) > goto reset_affinity; > } >
Reviewed-by: Ilpo Järvinen <[email protected]> -- i.

