Hi folks. I thought I'd do a quick measurement of the effectiveness of the various caches which cache StaticInsts from the decoder using X86.
For SE mode, I did a very simple run of hello world. For such a short program, the hit rate was pretty low and I don't think representative. To get something a little more realistic, I did a linux boot up to mounting the root FS, and got better results: addr_map.accesses 224432609 # Number of accesses to the addr map decoder cache addr_map.hit_rate 1.00 # Hit rate in the addr map cache addr_map.hits 224364693 # Number of hits in the addr map cache This is for the cache which looks up what memory is at a given address, and if it matches what was there the last time returns the same StaticInst. As you can see, the hit rate is very high. Worth mentioning is that this is slightly more elaborate on x86 than it is in the common case since the version of the cache we're using is based on contextualizing state like the operating mode, and the mechanics of checking if the bytes for an instruction match are a little more complex because of the variable instruction size and lack of alignment restrictions. When/if that fails, a second cache is looked into which maps from an ExtMachInst, a contextualized version of the instruction from memory, without considering the address or memory: mi_map.accesses 67914 # Number of accesses to the mi map decoder cache mi_map.hit_rate 0.63 # Hit rate in the mi map cache mi_map.hits 42979 # Number of hits in the mi map cache As you can see, the hit rate here is not terrible, but it's a lot lower. For this to be worthwhile in this case, maintaining this cache must be at least about 5000 times faster than constructing a new StaticInst. One thing I'm not really sure of is why in the first case we're looking up an instruction based on its address first, instead of using just the bytes which lead to it. Perhaps it's to keep the hash map we're looking in small to make the lookup faster? If the PC doesn't play a role in the ExtMachInst (I'm pretty sure it never does?), then I don't think it can matter from a correctness perspective since I presume we get the right answer in the second lookup and we don't check against the PC. An experiment for another day would be to use the raw bytes of an instruction as the hash key and see how well that performs. That would be an easier experiment to do on an ISA which has a one to one relationship between fetched chunks of memory and instructions, like SPARC or perhaps RISCV(?). Also, the address based decode cache uses a two stage mini cache to look up a page which corresponds to a given address. I don't have the hit rate of that handy at the moment, but I think that first level has a hit rate of about 98%. If the cost of maintaining the small, two element LRU of those two elements is less than 50 times faster than the hash map lookup, then that may be something we want to revisit.
_______________________________________________ gem5-dev mailing list -- [email protected] To unsubscribe send an email to [email protected] %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
