I profiled the un-modified and the modified m5 ten times (this time there
was no load on the machine). Here are the average results:
% time std. dev actual time std. dev
un-modified
isTagPresent 19.99 0.35 47.17 1.23
cumulative 100 0.00 235.91 3.37
modified
isTagPresent 10.35 0.28 21.22 0.57
cumulative 100 0.00 205.11 2.94
Below is the patch, though it may not apply cleanly to current version of
m5 since I have few un-committed patches enqueued.
# HG changeset patch
# Parent 7ac53378e03b5116c48e6076167de6a2a2e06158
diff -r 7ac53378e03b src/mem/ruby/system/CacheMemory.cc
--- a/src/mem/ruby/system/CacheMemory.cc Thu Nov 25 13:23:51 2010
-0600
+++ b/src/mem/ruby/system/CacheMemory.cc Thu Nov 25 17:30:58 2010
-0600
@@ -84,6 +84,8 @@
m_locked[i][j] = -1;
}
}
+
+ m_valid_mru_address = false;
}
CacheMemory::~CacheMemory()
@@ -135,15 +137,26 @@
// Given a cache index: returns the index of the tag in a set.
// returns -1 if the tag is not found.
int
-CacheMemory::findTagInSet(Index cacheSet, const Address& tag) const
+CacheMemory::findTagInSet(Index cacheSet, const Address& tag)
{
assert(tag == line_address(tag));
+
+ if(m_valid_mru_address && m_mru_address == tag) return
m_mru_tag_index;
+
// search the set for the tags
+ m_valid_mru_address = true;
+ m_mru_address.setAddress(tag.getAddress());
+
m5::hash_map<Address, int>::const_iterator it =
m_tag_index.find(tag);
if (it != m_tag_index.end())
if (m_cache[cacheSet][it->second]->m_Permission !=
AccessPermission_NotPresent)
+ {
+ m_mru_tag_index = it->second;
return it->second;
+ }
+
+ m_mru_tag_index = -1;
return -1; // Not found
}
@@ -215,7 +228,7 @@
// tests to see if an address is present in the cache
bool
-CacheMemory::isTagPresent(const Address& address) const
+CacheMemory::isTagPresent(const Address& address)
{
assert(address == line_address(address));
Index cacheSet = addressToCacheSet(address);
@@ -276,6 +289,10 @@
m_locked[cacheSet][i] = -1;
m_tag_index[address] = i;
+ m_valid_mru_address = true;
+ m_mru_address.setAddress(address.getAddress());
+ m_mru_tag_index = i;
+
m_replacementPolicy_ptr->
touch(cacheSet, i, g_eventQueue_ptr->getTime());
@@ -300,6 +317,8 @@
address);
m_locked[cacheSet][loc] = -1;
m_tag_index.erase(address);
+
+ m_valid_mru_address = false;
}
}
@@ -327,18 +346,18 @@
}
// looks an address up in the cache
-const AbstractCacheEntry&
-CacheMemory::lookup(const Address& address) const
+/*const AbstractCacheEntry&
+CacheMemory::lookup(const Address& address)
{
assert(address == line_address(address));
Index cacheSet = addressToCacheSet(address);
int loc = findTagInSet(cacheSet, address);
assert(loc != -1);
return *m_cache[cacheSet][loc];
-}
+}*/
AccessPermission
-CacheMemory::getPermission(const Address& address) const
+CacheMemory::getPermission(const Address& address)
{
assert(address == line_address(address));
return lookup(address).m_Permission;
diff -r 7ac53378e03b src/mem/ruby/system/CacheMemory.hh
--- a/src/mem/ruby/system/CacheMemory.hh Thu Nov 25 13:23:51 2010
-0600
+++ b/src/mem/ruby/system/CacheMemory.hh Thu Nov 25 17:30:58 2010
-0600
@@ -74,7 +74,7 @@
DataBlock*& data_ptr);
// tests to see if an address is present in the cache
- bool isTagPresent(const Address& address) const;
+ bool isTagPresent(const Address& address);
// Returns true if there is:
// a) a tag match on this address or there is
@@ -92,10 +92,10 @@
// looks an address up in the cache
AbstractCacheEntry& lookup(const Address& address);
- const AbstractCacheEntry& lookup(const Address& address) const;
+ //const AbstractCacheEntry& lookup(const Address& address) const;
// Get/Set permission of cache block
- AccessPermission getPermission(const Address& address) const;
+ AccessPermission getPermission(const Address& address);
void changePermission(const Address& address, AccessPermission
new_perm);
int getLatency() const { return m_latency; }
@@ -138,7 +138,7 @@
// Given a cache tag: returns the index of the tag in a set.
// returns -1 if the tag is not found.
- int findTagInSet(Index line, const Address& tag) const;
+ int findTagInSet(Index line, const Address& tag);
int findTagInSetIgnorePermissions(Index cacheSet,
const Address& tag) const;
@@ -170,6 +170,10 @@
int m_cache_num_set_bits;
int m_cache_assoc;
int m_start_index_bit;
+
+ Address m_mru_address;
+ int m_mru_tag_index;
+ bool m_valid_mru_address;
};
#endif // __MEM_RUBY_SYSTEM_CACHEMEMORY_HH__
On Thu, 25 Nov 2010, Nilay Vaish wrote:
Brad and I had a discussion on Tuesday. We are still thinking how to resolve
this issue.
As a stop gap arrangement, I added a couple of variables to the CacheMemory
class which track the last address for which the lookup was performed. I am
posting the results from profiling before and after the change. I had compile
m5 with MOESI_hammer protocol and the simulation was allowed to run for
20,000,000,000 ticks. I would suggest not to look at the absolute time values
for they would vary depending on the load on the machine.
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
18.27 61.32 61.32 888688475 0.00 0.00
CacheMemory::isTagPresent(Address const&) const
5.97 81.36 20.04 219389124 0.00 0.00 Histogram::add(long
long)
2.99 91.39 10.03 204574578 0.00 0.00
CacheMemory::lookup(Address const&)
2.56 99.97 8.58 12852725 0.00 0.00
MemoryControl::executeCycle()
2.51 108.38 8.41 45887816 0.00 0.00
L1Cache_Controller::wakeup()
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
11.38 41.64 41.64 888688475 0.00 0.00
CacheMemory::isTagPresent(Address const&)
5.99 63.55 21.91 219389124 0.00 0.00 Histogram::add(long
long)
2.90 74.16 10.61 45887816 0.00 0.00
L1Cache_Controller::wakeup()
2.76 84.25 10.09 12852725 0.00 0.00
MemoryControl::executeCycle()
2.49 93.36 9.11 34522950 0.00 0.00
BaseSimpleCPU::preExecute()
I can post the patch on the review board if this looks good.
--
Nilay
On Tue, 23 Nov 2010, Nilay Vaish wrote:
Brad and I will be having a discussion today on how to resolve this issue.
--
Nilay
On Tue, 23 Nov 2010, Steve Reinhardt wrote:
Thanks for tracking that down; that confirms my suspicions.
I think the long-term answer is that the system needs to be reworked to
avoid having to do multiple tag lookups for a single access; I don't know
if
that's just an API change or if that's something that needs to be folded
into SLICCer. (BTW, what is the status of SLICCer? Is anyone working on
it, or likely to work on it again?)
In the short term, it's possible that some of the overhead can be avoided
by
building a "software cache" into isTagPresent(), by storing the last
address
looked up along with a pointer to the block, then just checking on each
call
to see if we're looking up the same address as last time and if so just
returning the same pointer before resorting to the hash table. I hope
that
doesn't lead to any coherence problems with the block changing out from
under this cached copy... if so, perhaps an additional block check is
required on hits.
Steve
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev