On 2/19/2018 3:19 PM, Thomas Gleixner wrote:
> On Mon, 19 Feb 2018, Reinette Chatre wrote:
>> On 2/19/2018 1:19 PM, Thomas Gleixner wrote:
>>> On Tue, 13 Feb 2018, Reinette Chatre wrote:
>>>> After a pseudo-locked region is locked it needs to be associated with
>>>> the RDT domain representing the pseudo-locked cache so that its life
>>>> cycle can be managed correctly.
>>>> Only a single pseudo-locked region can exist on any cache instance so we
>>>> maintain a single pointer to a pseudo-locked region from each RDT
>>> Why is only a single pseudo locked region possible?
>> The setup of a pseudo-locked region requires the usage of wbinvd. If a
>> second pseudo-locked region is thus attempted it will evict the
>> pseudo-locked data of the first.
> Why does it neeed wbinvd? wbinvd is a big hammer. What's wrong with clflush?
wbinvd is required by this hardware supported feature but limited to the
creation of the pseudo-locked region. An administrator could dedicate a
portion of cache to pseudo-locking and applications using this region
can come and go. The pseudo-locked region lifetime need not be tied to
application lifetime. The pseudo-locked region could be set up once on
boot and remain for lifetime of system.
Even so, understanding that it is a big hammer I did explore the
alternatives. Trying clflush, clflushopt, as well as clwb. Finding them
all to perform poorly(*) I went further to explore if it is possible to
use these other instructions with some additional work in support to
make them perform as well as wbinvd. The additional work included,
looping over the data more times than done for wbinvd, reducing the size
of memory locked in relationship to cache size, unused spacing between
pseudo-locked region and other regions, unmapped memory at end of
In addition to the above research from my side I also followed up with
the CPU architects directly to question the usage of these instructions
instead of wbinvd.
In all the testing and questioning I did I was only able to confirm that
wbinvd is required. Its use consistently results in the fewest cache
misses to the created pseudo-locked region.
(*) By poorly I mean that accessing the pseudo-locked region created
using these instructions resulted in significant cache misses.