Re: [m5-dev] Implementation of findTagInSet

Nilay Vaish Tue, 16 Nov 2010 15:17:12 -0800

I was looking at the MOESI hammer protocol. I think Steve's observationthat extra tag lookups are going on in the cache is correct. In particularI noticed that in the getState() and setState() functions, firstisTagPresent(address) is called and on the basis of the result (which istrue or false), getCacheEntry(address) is called. Surprisingly, thegetCacheEntry() function calls the isTagPresent() function again. Thesecalls are in the file src/mem/protocol/MOESI_hammer-cache.sm


Thanks
Nilay


On Tue, 16 Nov 2010, Nilay Vaish wrote:

I profiled M5 using MOESI_CMP_directory and MOESI_CMP_token protocols.isTagPresent() dominates in both of these protocols as well. But thepercentage of simulation time used is lesser (varies from 10% to 16%). I willtake a look at the assembly code for the direct cache set indexing approach.
In order to reduce the number of calls made to tag lookup, I would need toread about the protocol itself. Can you point some documentation onMOESI_hammer protocol?
--
Nilay


On Fri, 12 Nov 2010, Steve Reinhardt wrote:
On Fri, Nov 12, 2010 at 1:10 PM, Nilay Vaish <[email protected]> wrote:
I tried couple of ideas for improving the performance of thefindTagInSet()
function. These include having one hash table per cache set and replacing
the hash table with a two dimensional array indexed using <cache set,cache
way>. Neither of these ideas showed significant enough change in the
percentage of the time taken by isTagPresent() function to come to a
definite conclusion.

I looked the assembly code generated for the isTagPresent() function. Note
that the compiler inlines the funtion findTagIsPresent(). The assemblycode
is about 100 lines long. Again, the find function for std::hash_map gets
inlined. There is one division operation in the code and several load
operations. If we were to assume that the hash function is able to keeptheaverage occupancy of each bucket close to 1, then I think the time takenbythe function would be determined by the time taken by the loads. It mightbe
that a lot of loads end up missing in the cache.
I'm a little surprised that the direct cache set indexing approach was not
faster since I'd think that would be far less than 100 instructions, but
you're right that issues like whether loads hit or miss in the cache will
have a large impact.
As far as reordering the tags is concerned, since the hash_map is not
directly under our control, we will have to delete the tag entry in thehash
table and insert it again to make sure that it is the first entry that is
searched.
The reordering only makes sense if we replace the hash table with a more
conventional tag array.
Right now I am profiling with coherence protocol as MOESI_hammer. I am
thinking of profiling using a different protocol to make sure that it isnot
an artifact of the protocol in use.
That sounds like a good idea.

All in all, we would ideally like to both speed up individual calls and
reduce the number of calls. IIRC, gprof indicated that findTagInSet() was
called 4-5X more frequently than there were cache accesses, which makes no
sense to me; it seems like a typical cache hit should only require a single
tag lookup.

That's another thing to keep in mind, is that typical programs really have
very high cache hit rates, so another approach is to look at what happensin
the process of servicing an L1 cache hit and optimize that path as much as
possible.

Steve

_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Re: [m5-dev] Implementation of findTagInSet

Reply via email to