These profile results from testing ALPHA_FS_MESI_CMP_directory with
configs/example/ruby_fs.py. The simulation was allowed to run for
200,000,000,000 ticks.
Profile Result with unmodified SLICC
% cumulative self self total
time seconds seconds calls s/call s/call name
12.19 34.51 34.51 551229802 0.00 0.00
CacheMemory::isTagPresent(Address const&) const
8.41 58.33 23.82 17760155 0.00 0.00 PerfectSwitch::wakeup()
4.49 71.03 12.70 235904391 0.00 0.00 Histogram::add(long
long)
2.54 78.23 7.20 172127510 0.00 0.00
CacheMemory::lookup(Address const&)
2.33 84.82 6.59 93838596 0.00 0.00
MessageBuffer::enqueue(RefCountingPtr<Message>, long long)
2.10 90.77 5.95 105280086 0.00 0.00
RubyEventQueue::scheduleEventAbsolute(Consumer*, long long)
2.06 96.61 5.84 34537891 0.00 0.00
BaseSimpleCPU::preExecute()
1.95 102.12 5.51 43900461 0.00 0.00
RubyPort::M5Port::recvTiming(Packet*)
1.93 107.58 5.46 580192104 0.00 0.00 Set::Set(Set const&)
1.92 113.02 5.44 46506080 0.00 0.00
L1Cache_Controller::wakeup()
Result with modified SLICC
% cumulative self self total
time seconds seconds calls s/call s/call name
9.97 24.78 24.78 17760155 0.00 0.00 PerfectSwitch::wakeup()
5.42 38.27 13.49 101906879 0.00 0.00
CacheMemory::lookup_ptr(Address const&)
5.32 51.50 13.23 235904391 0.00 0.00 Histogram::add(long
long)
2.30 57.21 5.71 580192104 0.00 0.00 Set::Set(Set const&)
2.29 62.91 5.70 93838596 0.00 0.00
MessageBuffer::enqueue(RefCountingPtr<Message>, long long)
2.19 68.36 5.45 46506080 0.00 0.00
L1Cache_Controller::wakeup()
2.14 73.67 5.31 34537891 0.00 0.00
BaseSimpleCPU::preExecute()
2.10 78.89 5.22 11125106 0.00 0.00
MemoryControl::executeCycle()
2.06 84.02 5.13 96775149 0.00 0.00
RubyEventQueueNode::process()
1.98 88.94 4.92 105280086 0.00 0.00
RubyEventQueue::scheduleEventAbsolute(Consumer*, long long)
.
.
.
1.30 121.31 3.23 51172611 0.00 0.00
CacheMemory::isTagPresent(Address const&) const
I can send the complete data generated by gprof, if required.
I have inlined my comments.
On Mon, 20 Dec 2010, Beckmann, Brad wrote:
Hi Nilay,
I apologize for the delay, but I was mostly travelling / in meetings last
week and I didn't have a chance to review your patches and emails until this
morning.
Overall, your patches are definitely solid steps in the right direction and
your profiling data sounds very promising. If you get the chance, please
send it to me. I would be interested to know what are the top performance
bottlenecks after your change.
Before you spend time converting the other protocols, I do want to discuss
the three points you brought up last week (see below). I have a bunch of
free time over the next three days (Mon. - Wed.) and I do think a telephone
conversation is best to discuss these details. Let me know what times work
for you.
The semester is over, so I am available almost throughout the day. Today,
I have a meeting at 3, which I think should be at most an hour long. Over
next two days, I do not have any thing scheduled so far. So any time will
work.
Brad
1. Currently the implicit TBE and Cache Entry pointers are set to NULL in the
calls to doTransition() function. To set these, we would need to make calls
to a function that returns the pointer if the address is in the cache, NULL
otherwise.
I think we should retain the getEntry functions in the .sm files for in case
of L1 cache both instruction and the data cache needs to be checked. This is
something that I probably would prefer keeping out of SLICC. In fact, we
should add getEntry functions for TBEs where ever required.
These getEntry would now return a pointer instead of a reference. We would
need to add support for return_by_pointer to SLICC. Also, since these
functions would be used inside the Wakeup function, we would need to assume a
common name for them across all protocols, just like getState() function.
[BB] I would be very interested why you believe we should keep the getEntry
functions out of SLICC. In my mind, this is one of the few functions that is
very consistent across protocols. As I mentioned before, I really want to
keep any notion of pointers out of the .sm files and avoid the changes you
are proposing to getCacheEntry. We should probably discuss this in detail
over-the-phone.
We would need to figure out the cache memories machine has, their
hierarchy, whether there are I and D caches. In fact, MOESI-hammer has L1I
cache, L1D cache and L2 all in the same machine. I think we should not do
this analysis in the compiler.
2. I still think we would need to change the changePermission function in the
CacheMemory class. Presently it calls findTagInSet() twice. Instead, we would
pass on the CacheEntry whose permissions need to be changed. This would save
one call. We should also put the variable m_locked in the AbstractCacheEntry
(may be make it part of the permission variable) to avoid the second call.
[BB] I like moving the locked field to AbstractCacheEntry and removing the
separate m_locked data structure. However, just a minor point, but we should
avoid duplicating code in CacheMemory to support this change. Other than
that, this looks good to me.
Could not resist my self from carrying out this change. Now
changePermission() resides in AbstractCacheEntry. To get this working, I
had to change SLICC to support calls to member functions of base class
using an object of derived class.
3. In the getState() and setState() functions, we need to specify that the
function assumes that implicit TBE and CacheEntry pointers have been passed
as arguments. How should we do this? I think we would need to push them in to
the symbol table before they can be used in side the function.
[BB] I'm a little confused by your current patch. It appears that you are
proposing having two pairs of getState and setState functions. I would
really like to avoid that and just have one pair of getState and setState
functions. Also when I say implicitly pass the TBE and CacheEntry pointers,
I mean that for the actions (similar to address). However, I think it is fine
to explicitly pass these parameters into getState and setState (also similar
to Address and State).
No, I am not proposing two different versions of these functions. SLICC
has always assumed that getState() and setState() functions exist. Using
this, I make SLICC push cache_entry and tbe in to the Symbol Table when it
encounters declarations of getState() and setState(). As you pointed out,
this is similar to pushing address variable in Action declarations. This
works fine.
I think the major problem is in letting a function use a pointer, for
example cache_entry_ptr in two different forms - as 'cache_entry_ptr' and
as '*cache_entry_ptr'. The first form is used for passing to other
functions, the second form for editing the underlying object's data
members. Right now, in order to over come this, I push two instances of
the variable on to the symbol table, one that will output the first form,
another that will output the second form.
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev