These profile results from testing ALPHA_FS_MESI_CMP_directory with configs/example/ruby_fs.py. The simulation was allowed to run for 200,000,000,000 ticks.

Profile Result with unmodified SLICC

  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
12.19 34.51 34.51 551229802 0.00 0.00 CacheMemory::isTagPresent(Address const&) const
  8.41     58.33    23.82 17760155     0.00     0.00 PerfectSwitch::wakeup()
4.49 71.03 12.70 235904391 0.00 0.00 Histogram::add(long long) 2.54 78.23 7.20 172127510 0.00 0.00 CacheMemory::lookup(Address const&) 2.33 84.82 6.59 93838596 0.00 0.00 MessageBuffer::enqueue(RefCountingPtr<Message>, long long) 2.10 90.77 5.95 105280086 0.00 0.00 RubyEventQueue::scheduleEventAbsolute(Consumer*, long long) 2.06 96.61 5.84 34537891 0.00 0.00 BaseSimpleCPU::preExecute() 1.95 102.12 5.51 43900461 0.00 0.00 RubyPort::M5Port::recvTiming(Packet*)
  1.93    107.58     5.46 580192104     0.00     0.00  Set::Set(Set const&)
1.92 113.02 5.44 46506080 0.00 0.00 L1Cache_Controller::wakeup()


Result with modified SLICC

  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
  9.97     24.78    24.78 17760155     0.00     0.00 PerfectSwitch::wakeup()
5.42 38.27 13.49 101906879 0.00 0.00 CacheMemory::lookup_ptr(Address const&) 5.32 51.50 13.23 235904391 0.00 0.00 Histogram::add(long long)
  2.30     57.21     5.71 580192104     0.00     0.00  Set::Set(Set const&)
2.29 62.91 5.70 93838596 0.00 0.00 MessageBuffer::enqueue(RefCountingPtr<Message>, long long) 2.19 68.36 5.45 46506080 0.00 0.00 L1Cache_Controller::wakeup() 2.14 73.67 5.31 34537891 0.00 0.00 BaseSimpleCPU::preExecute() 2.10 78.89 5.22 11125106 0.00 0.00 MemoryControl::executeCycle() 2.06 84.02 5.13 96775149 0.00 0.00 RubyEventQueueNode::process() 1.98 88.94 4.92 105280086 0.00 0.00 RubyEventQueue::scheduleEventAbsolute(Consumer*, long long)
.
.
.
1.30 121.31 3.23 51172611 0.00 0.00 CacheMemory::isTagPresent(Address const&) const


I can send the complete data generated by gprof, if required.

I have inlined my comments.


On Mon, 20 Dec 2010, Beckmann, Brad wrote:

Hi Nilay,

I apologize for the delay, but I was mostly travelling / in meetings last week and I didn't have a chance to review your patches and emails until this morning.

Overall, your patches are definitely solid steps in the right direction and your profiling data sounds very promising. If you get the chance, please send it to me. I would be interested to know what are the top performance bottlenecks after your change.

Before you spend time converting the other protocols, I do want to discuss the three points you brought up last week (see below). I have a bunch of free time over the next three days (Mon. - Wed.) and I do think a telephone conversation is best to discuss these details. Let me know what times work for you.

The semester is over, so I am available almost throughout the day. Today, I have a meeting at 3, which I think should be at most an hour long. Over next two days, I do not have any thing scheduled so far. So any time will work.


Brad


1. Currently the implicit TBE and Cache Entry pointers are set to NULL in the calls to doTransition() function. To set these, we would need to make calls to a function that returns the pointer if the address is in the cache, NULL otherwise.

I think we should retain the getEntry functions in the .sm files for in case of L1 cache both instruction and the data cache needs to be checked. This is something that I probably would prefer keeping out of SLICC. In fact, we should add getEntry functions for TBEs where ever required.

These getEntry would now return a pointer instead of a reference. We would need to add support for return_by_pointer to SLICC. Also, since these functions would be used inside the Wakeup function, we would need to assume a common name for them across all protocols, just like getState() function.

[BB] I would be very interested why you believe we should keep the getEntry functions out of SLICC. In my mind, this is one of the few functions that is very consistent across protocols. As I mentioned before, I really want to keep any notion of pointers out of the .sm files and avoid the changes you are proposing to getCacheEntry. We should probably discuss this in detail over-the-phone.

We would need to figure out the cache memories machine has, their hierarchy, whether there are I and D caches. In fact, MOESI-hammer has L1I cache, L1D cache and L2 all in the same machine. I think we should not do this analysis in the compiler.


2. I still think we would need to change the changePermission function in the CacheMemory class. Presently it calls findTagInSet() twice. Instead, we would pass on the CacheEntry whose permissions need to be changed. This would save one call. We should also put the variable m_locked in the AbstractCacheEntry (may be make it part of the permission variable) to avoid the second call.

[BB] I like moving the locked field to AbstractCacheEntry and removing the separate m_locked data structure. However, just a minor point, but we should avoid duplicating code in CacheMemory to support this change. Other than that, this looks good to me.

Could not resist my self from carrying out this change. Now changePermission() resides in AbstractCacheEntry. To get this working, I had to change SLICC to support calls to member functions of base class using an object of derived class.


3. In the getState() and setState() functions, we need to specify that the function assumes that implicit TBE and CacheEntry pointers have been passed as arguments. How should we do this? I think we would need to push them in to the symbol table before they can be used in side the function.

[BB] I'm a little confused by your current patch. It appears that you are proposing having two pairs of getState and setState functions. I would really like to avoid that and just have one pair of getState and setState functions. Also when I say implicitly pass the TBE and CacheEntry pointers, I mean that for the actions (similar to address). However, I think it is fine to explicitly pass these parameters into getState and setState (also similar to Address and State).

No, I am not proposing two different versions of these functions. SLICC has always assumed that getState() and setState() functions exist. Using this, I make SLICC push cache_entry and tbe in to the Symbol Table when it encounters declarations of getState() and setState(). As you pointed out, this is similar to pushing address variable in Action declarations. This works fine.

I think the major problem is in letting a function use a pointer, for example cache_entry_ptr in two different forms - as 'cache_entry_ptr' and as '*cache_entry_ptr'. The first form is used for passing to other functions, the second form for editing the underlying object's data members. Right now, in order to over come this, I push two instances of the variable on to the symbol table, one that will output the first form, another that will output the second form.



_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to