If that's where a significant amount of time is being spent, we need to
either call it less or make it run faster :-).  Doing both is even better.

At a high level, the process of looking something up in an N-way associative
cache should not take that many instructions if N is small (a shift and an
add to find the tag index, at most N loads and compares to match the tags).
If we reorder the tags to search the MRU block first then we will
probabilistically keep the average number of tags searched well below N.

Steve

On Fri, Nov 5, 2010 at 9:27 AM, Nilay Vaish <[email protected]> wrote:

> I had another look at the profile output. On the machine that I am using (a
> 3.2 GHz Pentium 4), each call to isTagPresent() take about 57 ns. Assuming
> that the pipeline is functioning at is best, I think the number of uops
> executed would be ~500. Is that too much for this function?
>
> --
> Nilay
>
>
> On Fri, 5 Nov 2010, Nilay Vaish wrote:
>
>  Do you know what hash function is in use? Seems to me that the default
>> hash function is to hash to self. May be we should test with a different
>> hash function.
>>
>> --
>> Nilay
>>
>> On Fri, 5 Nov 2010, Steve Reinhardt wrote:
>>
>>  You can look at the call graph profile further down in the gprof output
>>> to
>>> figure out how much time is spent in functions that get called from
>>> isTagPresent.  If it's not specifically calling out findTagInSet, it may
>>> be
>>> because it's inlined in isTagPresent.
>>>
>>> Steve
>>>
>>> On Fri, Nov 5, 2010 at 7:58 AM, Nilay Vaish <[email protected]> wrote:
>>>
>>>  I ran ALPHA_FS_MOESI_hammer using the following command --
>>>>
>>>> ./build/ALPHA_FS_MOESI_hammer/m5.prof ./configs/example/ruby_fs.py
>>>>
>>>> I don't know how the benchmark is picked in case none is specified.
>>>> Below
>>>> is the gprof output --
>>>>
>>>>
>>>>  %   cumulative   self              self     total
>>>>  time   seconds   seconds    calls   s/call   s/call  name
>>>>  19.72     51.22    51.22 925285266     0.00     0.00
>>>> CacheMemory::isTagPresent(Address const&) const
>>>>  5.59     65.74    14.52 229035720     0.00     0.00
>>>>  Histogram::add(long
>>>> long)
>>>>  3.57     75.02     9.28 212664644     0.00     0.00
>>>> CacheMemory::lookup(Address const&)
>>>>  2.53     81.59     6.57 47830136     0.00     0.00
>>>> L1Cache_Controller::wakeup()
>>>>
>>>>
>>>> The output shows that about a fifth of the time is spent in the
>>>> isTagPresent() function.
>>>>
>>>> bool
>>>> CacheMemory::isTagPresent(const Address& address) const
>>>> {
>>>>   assert(address == line_address(address));
>>>>   Index cacheSet = addressToCacheSet(address);
>>>>   int loc = findTagInSet(cacheSet, address);
>>>>
>>>>   if (loc == -1) {
>>>>       // We didn't find the tag
>>>>       DPRINTF(RubyCache, "No tag match for address: %s\n", address);
>>>>       return false;
>>>>   }
>>>>   DPRINTF(RubyCache, "address: %s found\n", address);
>>>>   return true;
>>>> }
>>>>
>>>> Since m5.prof is compiled with -DNDEBUG and -DTRACING_ON=0, the assert()
>>>> and the DPRINTF() will not get compiled. The addressToCacheSet()
>>>> function
>>>> does some bitwise operations and some arithmetic operations. So it is
>>>> expected that it would not consume much time. So, most likely the
>>>> findTagInSet() function takes a major portion of the overall time
>>>> required
>>>> by the isTagPresent() function.
>>>>
>>>> --
>>>> Nilay
>>>>
>>>>
>>>>
>>  _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev
>
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to