Hi folks. I thought I'd do a quick measurement of the effectiveness of the
various caches which cache StaticInsts from the decoder using X86.

For SE mode, I did a very simple run of hello world. For such a short
program, the hit rate was pretty low and I don't think representative. To
get something a little more realistic, I did a linux boot up to mounting
the root FS, and got better results:

addr_map.accesses                           224432609
# Number of accesses to the addr map decoder cache
addr_map.hit_rate                                1.00
# Hit rate in the addr map cache
addr_map.hits                               224364693
# Number of hits in the addr map cache

This is for the cache which looks up what memory is at a given address, and
if it matches what was there the last time returns the same StaticInst. As
you can see, the hit rate is very high.

Worth mentioning is that this is slightly more elaborate on x86 than it is
in the common case since the version of the cache we're using is based on
contextualizing state like the operating mode, and the mechanics of
checking if the bytes for an instruction match are a little more complex
because of the variable instruction size and lack of alignment restrictions.

When/if that fails, a second cache is looked into which maps from an
ExtMachInst, a contextualized version of the instruction from memory,
without considering the address or memory:

mi_map.accesses                                 67914
# Number of accesses to the mi map decoder cache
mi_map.hit_rate                                  0.63
# Hit rate in the mi map cache
mi_map.hits                                     42979
# Number of hits in the mi map cache

As you can see, the hit rate here is not terrible, but it's a lot lower.
For this to be worthwhile in this case, maintaining this cache must be at
least about 5000 times faster than constructing a new StaticInst.

One thing I'm not really sure of is why in the first case we're looking up
an instruction based on its address first, instead of using just the bytes
which lead to it. Perhaps it's to keep the hash map we're looking in small
to make the lookup faster? If the PC doesn't play a role in the ExtMachInst
(I'm pretty sure it never does?), then I don't think it can matter from a
correctness perspective since I presume we get the right answer in the
second lookup and we don't check against the PC.

An experiment for another day would be to use the raw bytes of an
instruction as the hash key and see how well that performs. That would be
an easier experiment to do on an ISA which has a one to one relationship
between fetched chunks of memory and instructions, like SPARC or perhaps
RISCV(?).

Also, the address based decode cache uses a two stage mini cache to look up
a page which corresponds to a given address. I don't have the hit rate of
that handy at the moment, but I think that first level has a hit rate of
about 98%. If the cost of maintaining the small, two element LRU of those
two elements is less than 50 times faster than the hash map lookup, then
that may be something we want to revisit.
_______________________________________________
gem5-dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to