Hi Nilay,

Good job, this is clearly progress... you've sped up isTagPresent by 2X and
the simulation overall by almost 10%.  That's nothing to sneeze at.  It's
sad that isTagPresent is still the top function though.  Can you do some
tracing or other experiments to get a feel for whether keeping the last N
tags instead of the last 1 (for some small value of N, like 2 or 3) would be
useful?  Just printing out a trace of calls to isTagPresent should be enough
to get a feeling for whether that's worth hacking in a test implementation.

Also, I see a lot of your patch has to do with removing const labels from
isTagPresent... this is exactly the scenario the 'mutable' keyword was
designed for; if you mark your m_mru_* fields as mutable, then you shouldn't
have to remove the const labels from any of the function calls.

Steve

On Fri, Nov 26, 2010 at 9:37 AM, Nilay Vaish <[email protected]> wrote:

> I profiled the un-modified and the modified m5 ten times (this time there
> was no load on the machine). Here are the average results:
>
>              % time   std. dev  actual time   std. dev
> un-modified
> isTagPresent   19.99     0.35      47.17         1.23
> cumulative     100       0.00      235.91        3.37
>
> modified
> isTagPresent   10.35     0.28      21.22         0.57
> cumulative     100       0.00      205.11        2.94
>
> Below is the patch, though it may not apply cleanly to current version of
> m5 since I have few un-committed patches enqueued.
>
>
> # HG changeset patch
> # Parent 7ac53378e03b5116c48e6076167de6a2a2e06158
>
> diff -r 7ac53378e03b src/mem/ruby/system/CacheMemory.cc
> --- a/src/mem/ruby/system/CacheMemory.cc        Thu Nov 25 13:23:51 2010
> -0600
> +++ b/src/mem/ruby/system/CacheMemory.cc        Thu Nov 25 17:30:58 2010
> -0600
> @@ -84,6 +84,8 @@
>             m_locked[i][j] = -1;
>         }
>     }
> +
> +    m_valid_mru_address = false;
>  }
>
>  CacheMemory::~CacheMemory()
> @@ -135,15 +137,26 @@
>  // Given a cache index: returns the index of the tag in a set.
>  // returns -1 if the tag is not found.
>  int
> -CacheMemory::findTagInSet(Index cacheSet, const Address& tag) const
> +CacheMemory::findTagInSet(Index cacheSet, const Address& tag)
>  {
>     assert(tag == line_address(tag));
> +
> +    if(m_valid_mru_address && m_mru_address == tag) return
> m_mru_tag_index;
> +
>     // search the set for the tags
> +    m_valid_mru_address = true;
> +    m_mru_address.setAddress(tag.getAddress());
> +
>     m5::hash_map<Address, int>::const_iterator it = m_tag_index.find(tag);
>     if (it != m_tag_index.end())
>         if (m_cache[cacheSet][it->second]->m_Permission !=
>             AccessPermission_NotPresent)
> +        {
> +            m_mru_tag_index = it->second;
>             return it->second;
> +        }
> +
> +    m_mru_tag_index = -1;
>     return -1; // Not found
>  }
>
> @@ -215,7 +228,7 @@
>
>  // tests to see if an address is present in the cache
>
>  bool
> -CacheMemory::isTagPresent(const Address& address) const
> +CacheMemory::isTagPresent(const Address& address)
>
>  {
>     assert(address == line_address(address));
>     Index cacheSet = addressToCacheSet(address);
> @@ -276,6 +289,10 @@
>             m_locked[cacheSet][i] = -1;
>             m_tag_index[address] = i;
>
> +            m_valid_mru_address = true;
> +            m_mru_address.setAddress(address.getAddress());
> +            m_mru_tag_index = i;
> +
>             m_replacementPolicy_ptr->
>                 touch(cacheSet, i, g_eventQueue_ptr->getTime());
>
> @@ -300,6 +317,8 @@
>                 address);
>         m_locked[cacheSet][loc] = -1;
>         m_tag_index.erase(address);
> +
> +        m_valid_mru_address = false;
>     }
>  }
>
> @@ -327,18 +346,18 @@
>  }
>
>  // looks an address up in the cache
> -const AbstractCacheEntry&
> -CacheMemory::lookup(const Address& address) const
> +/*const AbstractCacheEntry&
> +CacheMemory::lookup(const Address& address)
>
>  {
>     assert(address == line_address(address));
>     Index cacheSet = addressToCacheSet(address);
>     int loc = findTagInSet(cacheSet, address);
>     assert(loc != -1);
>     return *m_cache[cacheSet][loc];
> -}
> +}*/
>
>  AccessPermission
> -CacheMemory::getPermission(const Address& address) const
> +CacheMemory::getPermission(const Address& address)
>
>  {
>     assert(address == line_address(address));
>     return lookup(address).m_Permission;
> diff -r 7ac53378e03b src/mem/ruby/system/CacheMemory.hh
> --- a/src/mem/ruby/system/CacheMemory.hh        Thu Nov 25 13:23:51 2010
> -0600
> +++ b/src/mem/ruby/system/CacheMemory.hh        Thu Nov 25 17:30:58 2010
> -0600
> @@ -74,7 +74,7 @@
>                          DataBlock*& data_ptr);
>
>     // tests to see if an address is present in the cache
> -    bool isTagPresent(const Address& address) const;
> +    bool isTagPresent(const Address& address);
>
>     // Returns true if there is:
>     //   a) a tag match on this address or there is
> @@ -92,10 +92,10 @@
>
>     // looks an address up in the cache
>     AbstractCacheEntry& lookup(const Address& address);
> -    const AbstractCacheEntry& lookup(const Address& address) const;
> +    //const AbstractCacheEntry& lookup(const Address& address) const;
>
>     // Get/Set permission of cache block
> -    AccessPermission getPermission(const Address& address) const;
> +    AccessPermission getPermission(const Address& address);
>     void changePermission(const Address& address, AccessPermission
> new_perm);
>
>     int getLatency() const { return m_latency; }
> @@ -138,7 +138,7 @@
>
>     // Given a cache tag: returns the index of the tag in a set.
>     // returns -1 if the tag is not found.
> -    int findTagInSet(Index line, const Address& tag) const;
> +    int findTagInSet(Index line, const Address& tag);
>     int findTagInSetIgnorePermissions(Index cacheSet,
>                                       const Address& tag) const;
>
> @@ -170,6 +170,10 @@
>     int m_cache_num_set_bits;
>     int m_cache_assoc;
>     int m_start_index_bit;
> +
> +    Address m_mru_address;
> +    int m_mru_tag_index;
> +               bool m_valid_mru_address;
>  };
>
>  #endif // __MEM_RUBY_SYSTEM_CACHEMEMORY_HH__
>
>
>
>
> On Thu, 25 Nov 2010, Nilay Vaish wrote:
>
>  Brad and I had a discussion on Tuesday. We are still thinking how to
>> resolve this issue.
>>
>> As a stop gap arrangement, I added a couple of variables to the
>> CacheMemory class which track the last address for which the lookup was
>> performed. I am posting the results from profiling before and after the
>> change. I had compile m5 with MOESI_hammer protocol and the simulation was
>> allowed to run for 20,000,000,000 ticks. I would suggest not to look at the
>> absolute time values for they would vary depending on the load on the
>> machine.
>>
>> Each sample counts as 0.01 seconds.
>>  %   cumulative   self              self     total
>> time   seconds   seconds    calls   s/call   s/call  name
>> 18.27     61.32    61.32 888688475     0.00     0.00
>> CacheMemory::isTagPresent(Address const&) const
>>  5.97     81.36    20.04 219389124     0.00     0.00  Histogram::add(long
>> long)
>>  2.99     91.39    10.03 204574578     0.00     0.00
>> CacheMemory::lookup(Address const&)
>>  2.56     99.97     8.58 12852725     0.00     0.00
>> MemoryControl::executeCycle()
>>  2.51    108.38     8.41 45887816     0.00     0.00
>> L1Cache_Controller::wakeup()
>>
>>
>>
>> Each sample counts as 0.01 seconds.
>>  %   cumulative   self              self     total
>> time   seconds   seconds    calls   s/call   s/call  name
>> 11.38     41.64    41.64 888688475     0.00     0.00
>> CacheMemory::isTagPresent(Address const&)
>>  5.99     63.55    21.91 219389124     0.00     0.00  Histogram::add(long
>> long)
>>  2.90     74.16    10.61 45887816     0.00     0.00
>> L1Cache_Controller::wakeup()
>>  2.76     84.25    10.09 12852725     0.00     0.00
>> MemoryControl::executeCycle()
>>  2.49     93.36     9.11 34522950     0.00     0.00
>> BaseSimpleCPU::preExecute()
>>
>>
>> I can post the patch on the review board if this looks good.
>>
>> --
>> Nilay
>>
>>
>>
>> On Tue, 23 Nov 2010, Nilay Vaish wrote:
>>
>>  Brad and I will be having a discussion today on how to resolve this
>>> issue.
>>>
>>> --
>>> Nilay
>>>
>>>
>>> On Tue, 23 Nov 2010, Steve Reinhardt wrote:
>>>
>>>  Thanks for tracking that down; that confirms my suspicions.
>>>>
>>>> I think the long-term answer is that the system needs to be reworked to
>>>> avoid having to do multiple tag lookups for a single access; I don't
>>>> know if
>>>> that's just an API change or if that's something that needs to be folded
>>>> into SLICCer.  (BTW, what is the status of SLICCer?  Is anyone working
>>>> on
>>>> it, or likely to work on it again?)
>>>>
>>>> In the short term, it's possible that some of the overhead can be
>>>> avoided by
>>>> building a "software cache" into isTagPresent(), by storing the last
>>>> address
>>>> looked up along with a pointer to the block, then just checking on each
>>>> call
>>>> to see if we're looking up the same address as last time and if so just
>>>> returning the same pointer before resorting to the hash table.  I hope
>>>> that
>>>> doesn't lead to any coherence problems with the block changing out from
>>>> under this cached copy... if so, perhaps an additional block check is
>>>> required on hits.
>>>>
>>>> Steve
>>>>
>>>>
>>>>
>>  _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev
>
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to