Brad and I had a discussion on Tuesday. We are still thinking how to resolve this issue.

As a stop gap arrangement, I added a couple of variables to the CacheMemory class which track the last address for which the lookup was performed. I am posting the results from profiling before and after the change. I had compile m5 with MOESI_hammer protocol and the simulation was allowed to run for 20,000,000,000 ticks. I would suggest not to look at the absolute time values for they would vary depending on the load on the machine.

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
18.27 61.32 61.32 888688475 0.00 0.00 CacheMemory::isTagPresent(Address const&) const 5.97 81.36 20.04 219389124 0.00 0.00 Histogram::add(long long) 2.99 91.39 10.03 204574578 0.00 0.00 CacheMemory::lookup(Address const&) 2.56 99.97 8.58 12852725 0.00 0.00 MemoryControl::executeCycle() 2.51 108.38 8.41 45887816 0.00 0.00 L1Cache_Controller::wakeup()



Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
11.38 41.64 41.64 888688475 0.00 0.00 CacheMemory::isTagPresent(Address const&) 5.99 63.55 21.91 219389124 0.00 0.00 Histogram::add(long long) 2.90 74.16 10.61 45887816 0.00 0.00 L1Cache_Controller::wakeup() 2.76 84.25 10.09 12852725 0.00 0.00 MemoryControl::executeCycle() 2.49 93.36 9.11 34522950 0.00 0.00 BaseSimpleCPU::preExecute()


I can post the patch on the review board if this looks good.

--
Nilay



On Tue, 23 Nov 2010, Nilay Vaish wrote:

Brad and I will be having a discussion today on how to resolve this issue.

--
Nilay


On Tue, 23 Nov 2010, Steve Reinhardt wrote:

Thanks for tracking that down; that confirms my suspicions.

I think the long-term answer is that the system needs to be reworked to
avoid having to do multiple tag lookups for a single access; I don't know if
that's just an API change or if that's something that needs to be folded
into SLICCer.  (BTW, what is the status of SLICCer?  Is anyone working on
it, or likely to work on it again?)

In the short term, it's possible that some of the overhead can be avoided by building a "software cache" into isTagPresent(), by storing the last address looked up along with a pointer to the block, then just checking on each call
to see if we're looking up the same address as last time and if so just
returning the same pointer before resorting to the hash table.  I hope that
doesn't lead to any coherence problems with the block changing out from
under this cached copy... if so, perhaps an additional block check is
required on hits.

Steve


_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to