I tried couple of ideas for improving the performance of the
findTagInSet() function. These include having one hash table per cache set
and replacing the hash table with a two dimensional array indexed using
<cache set, cache way>. Neither of these ideas showed significant enough
change in the percentage of the time taken by isTagPresent() function to
come to a definite conclusion.
I looked the assembly code generated for the isTagPresent() function. Note
that the compiler inlines the funtion findTagIsPresent(). The assembly
code is about 100 lines long. Again, the find function for std::hash_map
gets inlined. There is one division operation in the code and several load
operations. If we were to assume that the hash function is able to keep
the average occupancy of each bucket close to 1, then I think the time
taken by the function would be determined by the time taken by the loads.
It might be that a lot of loads end up missing in the cache.
As far as reordering the tags is concerned, since the hash_map is not
directly under our control, we will have to delete the tag entry in the
hash table and insert it again to make sure that it is the first entry
that is searched.
Right now I am profiling with coherence protocol as MOESI_hammer. I am
thinking of profiling using a different protocol to make sure that it is
not an artifact of the protocol in use.
--
Nilay
On Fri, 5 Nov 2010, Steve Reinhardt wrote:
If that's where a significant amount of time is being spent, we need to
either call it less or make it run faster :-). Doing both is even better.
At a high level, the process of looking something up in an N-way associative
cache should not take that many instructions if N is small (a shift and an
add to find the tag index, at most N loads and compares to match the tags).
If we reorder the tags to search the MRU block first then we will
probabilistically keep the average number of tags searched well below N.
Steve
On Fri, Nov 5, 2010 at 9:27 AM, Nilay Vaish <ni...@cs.wisc.edu> wrote:
I had another look at the profile output. On the machine that I am using (a
3.2 GHz Pentium 4), each call to isTagPresent() take about 57 ns. Assuming
that the pipeline is functioning at is best, I think the number of uops
executed would be ~500. Is that too much for this function?
--
Nilay
On Fri, 5 Nov 2010, Nilay Vaish wrote:
Do you know what hash function is in use? Seems to me that the default
hash function is to hash to self. May be we should test with a different
hash function.
--
Nilay
On Fri, 5 Nov 2010, Steve Reinhardt wrote:
You can look at the call graph profile further down in the gprof output
to
figure out how much time is spent in functions that get called from
isTagPresent. If it's not specifically calling out findTagInSet, it may
be
because it's inlined in isTagPresent.
Steve
On Fri, Nov 5, 2010 at 7:58 AM, Nilay Vaish <ni...@cs.wisc.edu> wrote:
I ran ALPHA_FS_MOESI_hammer using the following command --
./build/ALPHA_FS_MOESI_hammer/m5.prof ./configs/example/ruby_fs.py
I don't know how the benchmark is picked in case none is specified.
Below
is the gprof output --
% cumulative self self total
time seconds seconds calls s/call s/call name
19.72 51.22 51.22 925285266 0.00 0.00
CacheMemory::isTagPresent(Address const&) const
5.59 65.74 14.52 229035720 0.00 0.00
Histogram::add(long
long)
3.57 75.02 9.28 212664644 0.00 0.00
CacheMemory::lookup(Address const&)
2.53 81.59 6.57 47830136 0.00 0.00
L1Cache_Controller::wakeup()
The output shows that about a fifth of the time is spent in the
isTagPresent() function.
bool
CacheMemory::isTagPresent(const Address& address) const
{
assert(address == line_address(address));
Index cacheSet = addressToCacheSet(address);
int loc = findTagInSet(cacheSet, address);
if (loc == -1) {
// We didn't find the tag
DPRINTF(RubyCache, "No tag match for address: %s\n", address);
return false;
}
DPRINTF(RubyCache, "address: %s found\n", address);
return true;
}
Since m5.prof is compiled with -DNDEBUG and -DTRACING_ON=0, the assert()
and the DPRINTF() will not get compiled. The addressToCacheSet()
function
does some bitwise operations and some arithmetic operations. So it is
expected that it would not consume much time. So, most likely the
findTagInSet() function takes a major portion of the overall time
required
by the isTagPresent() function.
--
Nilay
_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev
_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev