I profiled the un-modified and the modified m5 ten times (this time there was no load on the machine). Here are the average results:

              % time   std. dev  actual time   std. dev
un-modified
isTagPresent   19.99     0.35      47.17         1.23
cumulative     100       0.00      235.91        3.37

modified
isTagPresent   10.35     0.28      21.22         0.57
cumulative     100       0.00      205.11        2.94

Below is the patch, though it may not apply cleanly to current version of m5 since I have few un-committed patches enqueued.


# HG changeset patch
# Parent 7ac53378e03b5116c48e6076167de6a2a2e06158

diff -r 7ac53378e03b src/mem/ruby/system/CacheMemory.cc
--- a/src/mem/ruby/system/CacheMemory.cc Thu Nov 25 13:23:51 2010 -0600 +++ b/src/mem/ruby/system/CacheMemory.cc Thu Nov 25 17:30:58 2010 -0600
@@ -84,6 +84,8 @@
             m_locked[i][j] = -1;
         }
     }
+
+    m_valid_mru_address = false;
 }

 CacheMemory::~CacheMemory()
@@ -135,15 +137,26 @@
 // Given a cache index: returns the index of the tag in a set.
 // returns -1 if the tag is not found.
 int
-CacheMemory::findTagInSet(Index cacheSet, const Address& tag) const
+CacheMemory::findTagInSet(Index cacheSet, const Address& tag)
 {
     assert(tag == line_address(tag));
+
+ if(m_valid_mru_address && m_mru_address == tag) return m_mru_tag_index;
+
     // search the set for the tags
+    m_valid_mru_address = true;
+    m_mru_address.setAddress(tag.getAddress());
+
m5::hash_map<Address, int>::const_iterator it = m_tag_index.find(tag);
     if (it != m_tag_index.end())
         if (m_cache[cacheSet][it->second]->m_Permission !=
             AccessPermission_NotPresent)
+        {
+            m_mru_tag_index = it->second;
             return it->second;
+        }
+
+    m_mru_tag_index = -1;
     return -1; // Not found
 }

@@ -215,7 +228,7 @@

 // tests to see if an address is present in the cache
 bool
-CacheMemory::isTagPresent(const Address& address) const
+CacheMemory::isTagPresent(const Address& address)
 {
     assert(address == line_address(address));
     Index cacheSet = addressToCacheSet(address);
@@ -276,6 +289,10 @@
             m_locked[cacheSet][i] = -1;
             m_tag_index[address] = i;

+            m_valid_mru_address = true;
+            m_mru_address.setAddress(address.getAddress());
+            m_mru_tag_index = i;
+
             m_replacementPolicy_ptr->
                 touch(cacheSet, i, g_eventQueue_ptr->getTime());

@@ -300,6 +317,8 @@
                 address);
         m_locked[cacheSet][loc] = -1;
         m_tag_index.erase(address);
+
+        m_valid_mru_address = false;
     }
 }

@@ -327,18 +346,18 @@
 }

 // looks an address up in the cache
-const AbstractCacheEntry&
-CacheMemory::lookup(const Address& address) const
+/*const AbstractCacheEntry&
+CacheMemory::lookup(const Address& address)
 {
     assert(address == line_address(address));
     Index cacheSet = addressToCacheSet(address);
     int loc = findTagInSet(cacheSet, address);
     assert(loc != -1);
     return *m_cache[cacheSet][loc];
-}
+}*/

 AccessPermission
-CacheMemory::getPermission(const Address& address) const
+CacheMemory::getPermission(const Address& address)
 {
     assert(address == line_address(address));
     return lookup(address).m_Permission;
diff -r 7ac53378e03b src/mem/ruby/system/CacheMemory.hh
--- a/src/mem/ruby/system/CacheMemory.hh Thu Nov 25 13:23:51 2010 -0600 +++ b/src/mem/ruby/system/CacheMemory.hh Thu Nov 25 17:30:58 2010 -0600
@@ -74,7 +74,7 @@
                          DataBlock*& data_ptr);

     // tests to see if an address is present in the cache
-    bool isTagPresent(const Address& address) const;
+    bool isTagPresent(const Address& address);

     // Returns true if there is:
     //   a) a tag match on this address or there is
@@ -92,10 +92,10 @@

     // looks an address up in the cache
     AbstractCacheEntry& lookup(const Address& address);
-    const AbstractCacheEntry& lookup(const Address& address) const;
+    //const AbstractCacheEntry& lookup(const Address& address) const;

     // Get/Set permission of cache block
-    AccessPermission getPermission(const Address& address) const;
+    AccessPermission getPermission(const Address& address);
void changePermission(const Address& address, AccessPermission new_perm);

     int getLatency() const { return m_latency; }
@@ -138,7 +138,7 @@

     // Given a cache tag: returns the index of the tag in a set.
     // returns -1 if the tag is not found.
-    int findTagInSet(Index line, const Address& tag) const;
+    int findTagInSet(Index line, const Address& tag);
     int findTagInSetIgnorePermissions(Index cacheSet,
                                       const Address& tag) const;

@@ -170,6 +170,10 @@
     int m_cache_num_set_bits;
     int m_cache_assoc;
     int m_start_index_bit;
+
+    Address m_mru_address;
+    int m_mru_tag_index;
+               bool m_valid_mru_address;
 };

 #endif // __MEM_RUBY_SYSTEM_CACHEMEMORY_HH__



On Thu, 25 Nov 2010, Nilay Vaish wrote:

Brad and I had a discussion on Tuesday. We are still thinking how to resolve this issue.

As a stop gap arrangement, I added a couple of variables to the CacheMemory class which track the last address for which the lookup was performed. I am posting the results from profiling before and after the change. I had compile m5 with MOESI_hammer protocol and the simulation was allowed to run for 20,000,000,000 ticks. I would suggest not to look at the absolute time values for they would vary depending on the load on the machine.

Each sample counts as 0.01 seconds.
 %   cumulative   self              self     total
time   seconds   seconds    calls   s/call   s/call  name
18.27 61.32 61.32 888688475 0.00 0.00 CacheMemory::isTagPresent(Address const&) const 5.97 81.36 20.04 219389124 0.00 0.00 Histogram::add(long long) 2.99 91.39 10.03 204574578 0.00 0.00 CacheMemory::lookup(Address const&) 2.56 99.97 8.58 12852725 0.00 0.00 MemoryControl::executeCycle() 2.51 108.38 8.41 45887816 0.00 0.00 L1Cache_Controller::wakeup()



Each sample counts as 0.01 seconds.
 %   cumulative   self              self     total
time   seconds   seconds    calls   s/call   s/call  name
11.38 41.64 41.64 888688475 0.00 0.00 CacheMemory::isTagPresent(Address const&) 5.99 63.55 21.91 219389124 0.00 0.00 Histogram::add(long long) 2.90 74.16 10.61 45887816 0.00 0.00 L1Cache_Controller::wakeup() 2.76 84.25 10.09 12852725 0.00 0.00 MemoryControl::executeCycle() 2.49 93.36 9.11 34522950 0.00 0.00 BaseSimpleCPU::preExecute()


I can post the patch on the review board if this looks good.

--
Nilay



On Tue, 23 Nov 2010, Nilay Vaish wrote:

Brad and I will be having a discussion today on how to resolve this issue.

--
Nilay


On Tue, 23 Nov 2010, Steve Reinhardt wrote:

Thanks for tracking that down; that confirms my suspicions.

I think the long-term answer is that the system needs to be reworked to
avoid having to do multiple tag lookups for a single access; I don't know if
that's just an API change or if that's something that needs to be folded
into SLICCer.  (BTW, what is the status of SLICCer?  Is anyone working on
it, or likely to work on it again?)

In the short term, it's possible that some of the overhead can be avoided by building a "software cache" into isTagPresent(), by storing the last address looked up along with a pointer to the block, then just checking on each call
to see if we're looking up the same address as last time and if so just
returning the same pointer before resorting to the hash table. I hope that
doesn't lead to any coherence problems with the block changing out from
under this cached copy... if so, perhaps an additional block check is
required on hits.

Steve



_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to