Nice work!  No need to send the full profile, but what is the net speedup
here?  It seems like we should have eliminated about 10% of the runtime, but
I wanted to verify that.

Also, what workload are you running on top?  With all the time spent in
PerfectSwitch I'm guessing there's a lot of interconnect traffic; if you're
running the tester then that's not so bad, but if you're running a regular
program that seems high.

Thanks,

Steve

On Mon, Dec 20, 2010 at 9:47 AM, Nilay Vaish <[email protected]> wrote:

> These profile results from testing ALPHA_FS_MESI_CMP_directory with
> configs/example/ruby_fs.py. The simulation was allowed to run for
> 200,000,000,000 ticks.
>
> Profile Result with unmodified SLICC
>
>
>  %   cumulative   self              self     total
>  time   seconds   seconds    calls   s/call   s/call  name
>  12.19     34.51    34.51 551229802     0.00     0.00
> CacheMemory::isTagPresent(Address const&) const
>  8.41     58.33    23.82 17760155     0.00     0.00 PerfectSwitch::wakeup()
>  4.49     71.03    12.70 235904391     0.00     0.00  Histogram::add(long
> long)
>  2.54     78.23     7.20 172127510     0.00     0.00
> CacheMemory::lookup(Address const&)
>  2.33     84.82     6.59 93838596     0.00     0.00
> MessageBuffer::enqueue(RefCountingPtr<Message>, long long)
>  2.10     90.77     5.95 105280086     0.00     0.00
> RubyEventQueue::scheduleEventAbsolute(Consumer*, long long)
>  2.06     96.61     5.84 34537891     0.00     0.00
> BaseSimpleCPU::preExecute()
>  1.95    102.12     5.51 43900461     0.00     0.00
> RubyPort::M5Port::recvTiming(Packet*)
>  1.93    107.58     5.46 580192104     0.00     0.00  Set::Set(Set const&)
>  1.92    113.02     5.44 46506080     0.00     0.00
> L1Cache_Controller::wakeup()
>
>
> Result with modified SLICC
>
>
>  %   cumulative   self              self     total
>  time   seconds   seconds    calls   s/call   s/call  name
>  9.97     24.78    24.78 17760155     0.00     0.00 PerfectSwitch::wakeup()
>  5.42     38.27    13.49 101906879     0.00     0.00
> CacheMemory::lookup_ptr(Address const&)
>  5.32     51.50    13.23 235904391     0.00     0.00  Histogram::add(long
> long)
>  2.30     57.21     5.71 580192104     0.00     0.00  Set::Set(Set const&)
>  2.29     62.91     5.70 93838596     0.00     0.00
> MessageBuffer::enqueue(RefCountingPtr<Message>, long long)
>  2.19     68.36     5.45 46506080     0.00     0.00
> L1Cache_Controller::wakeup()
>  2.14     73.67     5.31 34537891     0.00     0.00
> BaseSimpleCPU::preExecute()
>  2.10     78.89     5.22 11125106     0.00     0.00
> MemoryControl::executeCycle()
>  2.06     84.02     5.13 96775149     0.00     0.00
> RubyEventQueueNode::process()
>  1.98     88.94     4.92 105280086     0.00     0.00
> RubyEventQueue::scheduleEventAbsolute(Consumer*, long long)
> .
> .
> .
>  1.30    121.31     3.23 51172611     0.00     0.00
> CacheMemory::isTagPresent(Address const&) const
>
>
> I can send the complete data generated by gprof, if required.
>
> I have inlined my comments.
>
>
>
> On Mon, 20 Dec 2010, Beckmann, Brad wrote:
>
>  Hi Nilay,
>>
>> I apologize for the delay, but I was mostly travelling / in meetings last
>> week and I didn't have a chance to review your patches and emails until this
>> morning.
>>
>> Overall, your patches are definitely solid steps in the right direction
>> and your profiling data sounds very promising.  If you get the chance,
>> please send it to me.  I would be interested to know what are the top
>> performance bottlenecks after your change.
>>
>> Before you spend time converting the other protocols, I do want to discuss
>> the three points you brought up last week (see below).  I have a bunch of
>> free time over the next three days (Mon. - Wed.) and I do think a telephone
>> conversation is best to discuss these details.  Let me know what times work
>> for you.
>>
>
> The semester is over, so I am available almost throughout the day. Today, I
> have a meeting at 3, which I think should be at most an hour long. Over next
> two days, I do not have any thing scheduled so far. So any time will work.
>
>
>
>> Brad
>>
>>
>> 1. Currently the implicit TBE and Cache Entry pointers are set to NULL in
>> the calls to doTransition() function. To set these, we would need to make
>> calls to a function that returns the pointer if the address is in the cache,
>> NULL otherwise.
>>
>> I think we should retain the getEntry functions in the .sm files for in
>> case of L1 cache both instruction and the data cache needs to be checked.
>> This is something that I probably would prefer keeping out of SLICC. In
>> fact, we should add getEntry functions for TBEs where ever required.
>>
>> These getEntry would now return a pointer instead of a reference. We would
>> need to add support for return_by_pointer to SLICC. Also, since these
>> functions would be used inside the Wakeup function, we would need to assume
>> a common name for them across all protocols, just like getState() function.
>>
>> [BB] I would be very interested why you believe we should keep the
>> getEntry functions out of SLICC.  In my mind, this is one of the few
>> functions that is very consistent across protocols.  As I mentioned before,
>> I really want to keep any notion of pointers out of the .sm files and avoid
>> the changes you are proposing to getCacheEntry.  We should probably discuss
>> this in detail over-the-phone.
>>
>
> We would need to figure out the cache memories machine has, their
> hierarchy, whether there are I and D caches. In fact, MOESI-hammer has L1I
> cache, L1D cache and L2 all in the same machine. I think we should not do
> this analysis in the compiler.
>
>
>
>> 2. I still think we would need to change the changePermission function in
>> the CacheMemory class. Presently it calls findTagInSet() twice. Instead, we
>> would pass on the CacheEntry whose permissions need to be changed. This
>> would save one call. We should also put the variable m_locked in the
>> AbstractCacheEntry (may be make it part of the permission variable) to avoid
>> the second call.
>>
>> [BB] I like moving the locked field to AbstractCacheEntry and removing the
>> separate m_locked data structure.  However, just a minor point, but we
>> should avoid duplicating code in CacheMemory to support this change. Other
>> than that, this looks good to me.
>>
>
> Could not resist my self from carrying out this change. Now
> changePermission() resides in AbstractCacheEntry. To get this working, I had
> to change SLICC to support calls to member functions of base class using an
> object of derived class.
>
>
>
>> 3. In the getState() and setState() functions, we need to specify that the
>> function assumes that implicit TBE and CacheEntry pointers have been passed
>> as arguments. How should we do this? I think we would need to push them in
>> to the symbol table before they can be used in side the function.
>>
>> [BB] I'm a little confused by your current patch.  It appears that you are
>> proposing having two pairs of getState and setState functions.  I would
>> really like to avoid that and just have one pair of getState and setState
>> functions.  Also when I say implicitly pass the TBE and CacheEntry pointers,
>> I mean that for the actions (similar to address). However, I think it is
>> fine to explicitly pass these parameters into getState and setState (also
>> similar to Address and State).
>>
>
> No, I am not proposing two different versions of these functions. SLICC has
> always assumed that getState() and setState() functions exist. Using this, I
> make SLICC push cache_entry and tbe in to the Symbol Table when it
> encounters declarations of getState() and setState(). As you pointed out,
> this is similar to pushing address variable in Action declarations. This
> works fine.
>
> I think the major problem is in letting a function use a pointer, for
> example cache_entry_ptr in two different forms - as 'cache_entry_ptr' and as
> '*cache_entry_ptr'. The first form is used for passing to other functions,
> the second form for editing the underlying object's data members. Right now,
> in order to over come this, I push two instances of the variable on to the
> symbol table, one that will output the first form, another that will output
> the second form.
>
>
>
>>
>> _______________________________________________
>> m5-dev mailing list
>> [email protected]
>> http://m5sim.org/mailman/listinfo/m5-dev
>>
>>  _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev
>
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to