On Fri, 6 Mar 2026, Reinette Chatre wrote:
> On 3/6/26 1:47 AM, Ilpo Järvinen wrote:
> > On Tue, 3 Mar 2026, Reinette Chatre wrote:
> > 
> >> Dave Martin reported inconsistent CMT test failures. In one experiment
> >> the first run of the CMT test failed because of too large (24%) difference
> >> between measured and achievable cache occupancy while the second run passed
> >> with an acceptable 4% difference.
> >>
> >> The CMT test is susceptible to interference from the rest of the system.
> >> This can be demonstrated with a utility like stress-ng by running the CMT
> >> test while introducing cache misses using:
> >>
> >>    stress-ng --matrix-3d 0 --matrix-3d-zyx
> >>
> >> Below shows an example of the CMT test failing because of a significant
> >> difference between measured and achievable cache occupancy when run with
> >> interference:
> >>     # Starting CMT test ...
> >>     # Mounting resctrl to "/sys/fs/resctrl"
> >>     # Cache size :56623104
> >>     # Writing benchmark parameters to resctrl FS
> >>     # Benchmark PID: 3275
> >>     # Checking for pass/fail
> >>     # Fail: Check cache miss rate within 15%
> >>     # Percent diff=97
> >>     # Number of bits: 5
> >>     # Average LLC val: 501350
> >>     # Cache span (bytes): 23592960
> >>     not ok 1 CMT: test
> >>
> >> The CMT test creates a new control group that is also capable of monitoring
> >> and assigns the workload to it. The workload allocates a buffer that by
> >> default fills a portion of the L3 and keeps reading from the buffer,
> >> measuring the L3 occupancy at intervals. The test passes if the workload's
> >> L3 occupancy is within 15% of the buffer size.
> >>
> >> By not adjusting any capacity bitmasks the workload shares the cache with
> >> the rest of the system. Any other task that may be running could evict
> >> the workload's data from the cache causing it to have low cache occupancy.
> >>
> >> Reduce interference from the rest of the system by ensuring that the
> >> workload's control group uses the capacity bitmask found in the user
> >> parameters for L3 and that the rest of the system can only allocate into
> >> the inverse of the workload's L3 cache portion. Other tasks can thus no
> >> longer evict the workload's data from L3.
> >>
> >> Take the L2 cache into account to further improve test accuracy.
> >> By default the buffer size is the same as the L3 portion that the workload
> >> can allocate into. This buffer size does not take into account that some
> >> of the workload's data may land in L2/L1. Address this in two ways:
> >>  - Reduce the amount of L2 cache the workload can allocate into to the
> > 
> > "into to the" sounds wrong.
> 
> How about:
>   "Reduce the workload's L2 cache allocation to the minimum on systems that
>    support L2 cache allocation."

Works for me.

-- 
 i.

Reply via email to