Brad and I had a discussion on Tuesday. We are still thinking how to
resolve this issue.
As a stop gap arrangement, I added a couple of variables to the
CacheMemory class which track the last address for which the lookup was
performed. I am posting the results from profiling before and after the
change. I had compile m5 with MOESI_hammer protocol and the simulation was
allowed to run for 20,000,000,000 ticks. I would suggest not to look at
the absolute time values for they would vary depending on the load on the
machine.
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
18.27 61.32 61.32 888688475 0.00 0.00
CacheMemory::isTagPresent(Address const&) const
5.97 81.36 20.04 219389124 0.00 0.00 Histogram::add(long
long)
2.99 91.39 10.03 204574578 0.00 0.00
CacheMemory::lookup(Address const&)
2.56 99.97 8.58 12852725 0.00 0.00
MemoryControl::executeCycle()
2.51 108.38 8.41 45887816 0.00 0.00
L1Cache_Controller::wakeup()
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
11.38 41.64 41.64 888688475 0.00 0.00
CacheMemory::isTagPresent(Address const&)
5.99 63.55 21.91 219389124 0.00 0.00 Histogram::add(long
long)
2.90 74.16 10.61 45887816 0.00 0.00
L1Cache_Controller::wakeup()
2.76 84.25 10.09 12852725 0.00 0.00
MemoryControl::executeCycle()
2.49 93.36 9.11 34522950 0.00 0.00
BaseSimpleCPU::preExecute()
I can post the patch on the review board if this looks good.
--
Nilay
On Tue, 23 Nov 2010, Nilay Vaish wrote:
Brad and I will be having a discussion today on how to resolve this issue.
--
Nilay
On Tue, 23 Nov 2010, Steve Reinhardt wrote:
Thanks for tracking that down; that confirms my suspicions.
I think the long-term answer is that the system needs to be reworked to
avoid having to do multiple tag lookups for a single access; I don't know
if
that's just an API change or if that's something that needs to be folded
into SLICCer. (BTW, what is the status of SLICCer? Is anyone working on
it, or likely to work on it again?)
In the short term, it's possible that some of the overhead can be avoided
by
building a "software cache" into isTagPresent(), by storing the last
address
looked up along with a pointer to the block, then just checking on each
call
to see if we're looking up the same address as last time and if so just
returning the same pointer before resorting to the hash table. I hope that
doesn't lead to any coherence problems with the block changing out from
under this cached copy... if so, perhaps an additional block check is
required on hits.
Steve
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev