Hi Nilay, Good job, this is clearly progress... you've sped up isTagPresent by 2X and the simulation overall by almost 10%. That's nothing to sneeze at. It's sad that isTagPresent is still the top function though. Can you do some tracing or other experiments to get a feel for whether keeping the last N tags instead of the last 1 (for some small value of N, like 2 or 3) would be useful? Just printing out a trace of calls to isTagPresent should be enough to get a feeling for whether that's worth hacking in a test implementation.
Also, I see a lot of your patch has to do with removing const labels from isTagPresent... this is exactly the scenario the 'mutable' keyword was designed for; if you mark your m_mru_* fields as mutable, then you shouldn't have to remove the const labels from any of the function calls. Steve On Fri, Nov 26, 2010 at 9:37 AM, Nilay Vaish <[email protected]> wrote: > I profiled the un-modified and the modified m5 ten times (this time there > was no load on the machine). Here are the average results: > > % time std. dev actual time std. dev > un-modified > isTagPresent 19.99 0.35 47.17 1.23 > cumulative 100 0.00 235.91 3.37 > > modified > isTagPresent 10.35 0.28 21.22 0.57 > cumulative 100 0.00 205.11 2.94 > > Below is the patch, though it may not apply cleanly to current version of > m5 since I have few un-committed patches enqueued. > > > # HG changeset patch > # Parent 7ac53378e03b5116c48e6076167de6a2a2e06158 > > diff -r 7ac53378e03b src/mem/ruby/system/CacheMemory.cc > --- a/src/mem/ruby/system/CacheMemory.cc Thu Nov 25 13:23:51 2010 > -0600 > +++ b/src/mem/ruby/system/CacheMemory.cc Thu Nov 25 17:30:58 2010 > -0600 > @@ -84,6 +84,8 @@ > m_locked[i][j] = -1; > } > } > + > + m_valid_mru_address = false; > } > > CacheMemory::~CacheMemory() > @@ -135,15 +137,26 @@ > // Given a cache index: returns the index of the tag in a set. > // returns -1 if the tag is not found. > int > -CacheMemory::findTagInSet(Index cacheSet, const Address& tag) const > +CacheMemory::findTagInSet(Index cacheSet, const Address& tag) > { > assert(tag == line_address(tag)); > + > + if(m_valid_mru_address && m_mru_address == tag) return > m_mru_tag_index; > + > // search the set for the tags > + m_valid_mru_address = true; > + m_mru_address.setAddress(tag.getAddress()); > + > m5::hash_map<Address, int>::const_iterator it = m_tag_index.find(tag); > if (it != m_tag_index.end()) > if (m_cache[cacheSet][it->second]->m_Permission != > AccessPermission_NotPresent) > + { > + m_mru_tag_index = it->second; > return it->second; > + } > + > + m_mru_tag_index = -1; > return -1; // Not found > } > > @@ -215,7 +228,7 @@ > > // tests to see if an address is present in the cache > > bool > -CacheMemory::isTagPresent(const Address& address) const > +CacheMemory::isTagPresent(const Address& address) > > { > assert(address == line_address(address)); > Index cacheSet = addressToCacheSet(address); > @@ -276,6 +289,10 @@ > m_locked[cacheSet][i] = -1; > m_tag_index[address] = i; > > + m_valid_mru_address = true; > + m_mru_address.setAddress(address.getAddress()); > + m_mru_tag_index = i; > + > m_replacementPolicy_ptr-> > touch(cacheSet, i, g_eventQueue_ptr->getTime()); > > @@ -300,6 +317,8 @@ > address); > m_locked[cacheSet][loc] = -1; > m_tag_index.erase(address); > + > + m_valid_mru_address = false; > } > } > > @@ -327,18 +346,18 @@ > } > > // looks an address up in the cache > -const AbstractCacheEntry& > -CacheMemory::lookup(const Address& address) const > +/*const AbstractCacheEntry& > +CacheMemory::lookup(const Address& address) > > { > assert(address == line_address(address)); > Index cacheSet = addressToCacheSet(address); > int loc = findTagInSet(cacheSet, address); > assert(loc != -1); > return *m_cache[cacheSet][loc]; > -} > +}*/ > > AccessPermission > -CacheMemory::getPermission(const Address& address) const > +CacheMemory::getPermission(const Address& address) > > { > assert(address == line_address(address)); > return lookup(address).m_Permission; > diff -r 7ac53378e03b src/mem/ruby/system/CacheMemory.hh > --- a/src/mem/ruby/system/CacheMemory.hh Thu Nov 25 13:23:51 2010 > -0600 > +++ b/src/mem/ruby/system/CacheMemory.hh Thu Nov 25 17:30:58 2010 > -0600 > @@ -74,7 +74,7 @@ > DataBlock*& data_ptr); > > // tests to see if an address is present in the cache > - bool isTagPresent(const Address& address) const; > + bool isTagPresent(const Address& address); > > // Returns true if there is: > // a) a tag match on this address or there is > @@ -92,10 +92,10 @@ > > // looks an address up in the cache > AbstractCacheEntry& lookup(const Address& address); > - const AbstractCacheEntry& lookup(const Address& address) const; > + //const AbstractCacheEntry& lookup(const Address& address) const; > > // Get/Set permission of cache block > - AccessPermission getPermission(const Address& address) const; > + AccessPermission getPermission(const Address& address); > void changePermission(const Address& address, AccessPermission > new_perm); > > int getLatency() const { return m_latency; } > @@ -138,7 +138,7 @@ > > // Given a cache tag: returns the index of the tag in a set. > // returns -1 if the tag is not found. > - int findTagInSet(Index line, const Address& tag) const; > + int findTagInSet(Index line, const Address& tag); > int findTagInSetIgnorePermissions(Index cacheSet, > const Address& tag) const; > > @@ -170,6 +170,10 @@ > int m_cache_num_set_bits; > int m_cache_assoc; > int m_start_index_bit; > + > + Address m_mru_address; > + int m_mru_tag_index; > + bool m_valid_mru_address; > }; > > #endif // __MEM_RUBY_SYSTEM_CACHEMEMORY_HH__ > > > > > On Thu, 25 Nov 2010, Nilay Vaish wrote: > > Brad and I had a discussion on Tuesday. We are still thinking how to >> resolve this issue. >> >> As a stop gap arrangement, I added a couple of variables to the >> CacheMemory class which track the last address for which the lookup was >> performed. I am posting the results from profiling before and after the >> change. I had compile m5 with MOESI_hammer protocol and the simulation was >> allowed to run for 20,000,000,000 ticks. I would suggest not to look at the >> absolute time values for they would vary depending on the load on the >> machine. >> >> Each sample counts as 0.01 seconds. >> % cumulative self self total >> time seconds seconds calls s/call s/call name >> 18.27 61.32 61.32 888688475 0.00 0.00 >> CacheMemory::isTagPresent(Address const&) const >> 5.97 81.36 20.04 219389124 0.00 0.00 Histogram::add(long >> long) >> 2.99 91.39 10.03 204574578 0.00 0.00 >> CacheMemory::lookup(Address const&) >> 2.56 99.97 8.58 12852725 0.00 0.00 >> MemoryControl::executeCycle() >> 2.51 108.38 8.41 45887816 0.00 0.00 >> L1Cache_Controller::wakeup() >> >> >> >> Each sample counts as 0.01 seconds. >> % cumulative self self total >> time seconds seconds calls s/call s/call name >> 11.38 41.64 41.64 888688475 0.00 0.00 >> CacheMemory::isTagPresent(Address const&) >> 5.99 63.55 21.91 219389124 0.00 0.00 Histogram::add(long >> long) >> 2.90 74.16 10.61 45887816 0.00 0.00 >> L1Cache_Controller::wakeup() >> 2.76 84.25 10.09 12852725 0.00 0.00 >> MemoryControl::executeCycle() >> 2.49 93.36 9.11 34522950 0.00 0.00 >> BaseSimpleCPU::preExecute() >> >> >> I can post the patch on the review board if this looks good. >> >> -- >> Nilay >> >> >> >> On Tue, 23 Nov 2010, Nilay Vaish wrote: >> >> Brad and I will be having a discussion today on how to resolve this >>> issue. >>> >>> -- >>> Nilay >>> >>> >>> On Tue, 23 Nov 2010, Steve Reinhardt wrote: >>> >>> Thanks for tracking that down; that confirms my suspicions. >>>> >>>> I think the long-term answer is that the system needs to be reworked to >>>> avoid having to do multiple tag lookups for a single access; I don't >>>> know if >>>> that's just an API change or if that's something that needs to be folded >>>> into SLICCer. (BTW, what is the status of SLICCer? Is anyone working >>>> on >>>> it, or likely to work on it again?) >>>> >>>> In the short term, it's possible that some of the overhead can be >>>> avoided by >>>> building a "software cache" into isTagPresent(), by storing the last >>>> address >>>> looked up along with a pointer to the block, then just checking on each >>>> call >>>> to see if we're looking up the same address as last time and if so just >>>> returning the same pointer before resorting to the hash table. I hope >>>> that >>>> doesn't lead to any coherence problems with the block changing out from >>>> under this cached copy... if so, perhaps an additional block check is >>>> required on hits. >>>> >>>> Steve >>>> >>>> >>>> >> _______________________________________________ > m5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/m5-dev >
_______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
